AI Engineer / RAG Pipeline Developer for Compliance Law Management Information System
Бюджет: $10.0 - $40.0
HOURLY / PART_TIME
⭐ 5.00 (3)
United States
database-architecture, python, artificial-intelligence, amazon-web-services
Key Responsibilities
You will be responsible for building an end-to-end pipeline including:
1. Data Collection & Crawling
- Design and implement web crawling pipelines for legal/compliance sources
- Extract structured and unstructured legal content from websites and portals
- Ensure compliance with robots.txt and legal scraping constraints
2. Document Processing (PDF + Text)
- Build robust PDF parsing and extraction pipeline using tools like Docling
- Handle complex legal documents (tables, footnotes, multi-column layouts)
- Clean, normalize, and structure extracted content for downstream AI use
3. RAG Pipeline Development
- Design and implement Retrieval-Augmented Generation architecture
- Chunking strategies optimized for legal/compliance context
- Embedding generation and metadata enrichment
- Query understanding and response synthesis using LLMs
4. Vector Database (Pinecone)
- Set up and optimize Pinecone vector database
- Design indexing schema (metadata, filters, namespaces)
- Optimize retrieval speed and accuracy
- Implement hybrid search if needed (keyword + vector)
5. AI/LLM Integration
- Integrate LLMs (OpenAI / open-source models)
- Build prompt engineering for compliance/legal reasoning
- Ensure traceability and citation-backed responses
Required Skills
- Strong experience building RAG systems in production
- Hands-on experience with Pinecone or other vector databases
- Experience with PDF parsing tools (Docling, PyMuPDF, Unstructured, etc.)
- Strong Python backend development skills
- Experience with web scraping/crawling frameworks (Scrapy, Playwright, etc.)
- Familiarity with LLM APIs (OpenAI, Anthropic, or open-source models)
- Understanding of embeddings, vector search, and semantic retrieval
- Experience handling large-scale document pipelines
Nice to Have
- Experience with legal tech or compliance systems
- Knowledge of information retrieval / NLP
- Experience with LangChain, LlamaIndex, or similar frameworks
- Cloud deployment (AWS/GCP/Azure)
- Docker / Kubernetes experience
Deliverables
- Fully functional ingestion + crawling pipeline
- PDF processing system using Docling or equivalent
- Pinecone vector database setup with optimized schema
- Working RAG system with API endpoints
- Documentation of architecture and setup
- Optional: simple UI for testing queries
Project Type
- Short-term MVP with potential for long-term extension
- Possibility of ongoing development and scaling
How to Apply
Please include:
- Relevant experience building RAG systems
- Examples of similar AI or document intelligence projects
- Your preferred stack for RAG pipelines
- Any experience with legal/compliance data systems
Відкрити на Upwork