AI/ML Engineer for document intelligence ( RAG, Chatbot, Agent orchestration)
Budget: $25.0 - $90.0
HOURLY / FULL_TIME
⭐ 5.00 (6)
PRT
artificial-intelligence, machine-learning, python, chatbot-development, orchestration
We're building an internal AI assistant that lets internal users ask natural-language questions over mixed document corpus — PDFs, SOPs, training material, internal wikis, and a long tail of mixed-quality files — and get back accurate, source-cited answers matched cross domains/sections.
This is a senior role. We're looking for someone who has shipped production RAG systems before — not just wired up a demo — and who understands that the difficulty isn't the LLM call. It's ingestion quality, retrieval precision, hallucination control, and having an evaluation loop that turns "it feels better" into a measurable number. If you've built this kind of system end to end and have war stories about chunk boundaries, OCR garbage, and retrieval that looked great in the demo and fell apart on real questions, you're who we want.
You'll own three connected pieces — a document ingestion agent, a retrieval pipeline, and a chat client facing interface — from architecture through deployment.
🔧 Key Responsibilities
RAG & Search Architecture
Design, implement, and optimize Retrieval-Augmented Generation (RAG) pipelines and vector embedding workflows over our internal document corpus
Build semantic search and contextual Q&A that returns accurate, source-cited answers rather than plausible-sounding guesses
Implement hybrid retrieval (keyword + dense vector) with a reranking stage, tuned for precision on real internal questions
Manage chunking strategy, embedding selection, and index design — benchmarked against our actual content, not assumed defaults
Document Ingestion & Data Pipelines
Build a document ingestion agent that handles structured and unstructured inputs: clean PDFs, scanned files, office documents, SOPs, and mixed-quality sources
Implement structure-preserving chunking, OCR fallback for image-based documents, and metadata extraction (source, section, page) for precise citations
Support incremental re-indexing so newly uploaded documents are searchable without full rebuilds
Where relevant, add structured extraction from messy documents into validated schemas with per-field confidence scoring
LLM Integration & Orchestration
Design and implement LLM-powered application logic using APIs (OpenAI, Anthropic/Claude, Gemini, or open-source alternatives) for our specific business use cases
Connect and orchestrate models behind a clean abstraction so the underlying LLM is swappable
Build prompt chains and tool-routing logic; where it adds value, implement AI agents that route across document RAG, SQL, and metadata sources
Add guardrails to mitigate hallucinations and handle "I don't know" gracefully
System Optimization
Proactively improve system latency, minimize API and token costs, and maximize response accuracy in live use (caching, batching, token-usage discipline)
Make and document fundamental trade-offs: cost vs. latency vs. precision/recall
Evaluation & Testing
Build and maintain a rigorous evaluation framework: a labelled golden question set with measurable retrieval-precision and answer-accuracy scores
Track performance benchmarks over time so improvements are numbers, not impressions
Continuously assess retrieval quality and regression as the corpus grows
Chat Application & Backend
Develop backend services in Python (FastAPI / Flask) with secure, high-performance REST APIs
Build a responsive, user-friendly chat interface for non-technical internal users
Implement streaming responses (WebSockets / SSE), multi-turn conversation memory, and inline source references on every answer
Design and maintain databases, an authentication system, and per-user access
Deployment, Scalability & Observability
Package and deploy scalable AI services to our chosen cloud (AWS / GCP / Azure) or VPS using Docker
Ensure scalability, observability, and performance monitoring — structured logging, error tracking, latency and cost dashboards from day one
Deliver complete technical documentation so the system is maintainable after handoff
🔷 Required Skills (Must-Have)
Software engineering depth: Several years of professional experience with a proven track record of shipping AI/ML solutions to production, not just prototypes
Python backend engineering: Deep, hands-on Python and modern backend architecture; production experience with FastAPI or Flask
Production RAG: Hands-on experience building production-grade RAG systems — embeddings, chunking, retrieval tuning, reranking, citation grounding
Vector databases: Practical experience managing vector databases (Pinecone, Qdrant, Weaviate, FAISS, or Chroma) and the judgment to select based on scale, ops, and data-isolation needs
LLMs: Hands-on experience with LLMs (GPT, Claude, Gemini, or open-source) and practical expertise in prompt engineering and LLM orchestration
Core fundamentals: Solid understanding of embeddings, tokenization, semantic search, and retrieval mechanics
Orchestration frameworks: Working knowledge of LangChain, LlamaIndex, Haystack, or a custom agentic framework
APIs & data: Strong experience with REST APIs, JSON-based systems, and both structured and unstructured data pipelines
Cloud & DevOps: Familiarity with cloud platforms (AWS / GCP / Azure), Docker, and deployment automation
ML metrics: Comfort with fundamental ML metrics and trade-offs — cost, latency, precision/recall — and the ability to reason about them out loud
Streaming: Knowledge of real-time streaming (WebSockets, SSE)
Version control: Fluent with Git and modern version-control workflows
Communication & reliability: Excellent, structured English; comfortable explaining architectural choices and leading technical discussions; high responsiveness and steady availability during agreed working hours
🔷 Good to Have (Bonus Skills)
Agentic frameworks: Experience building multi-agent or multi-tool systems with LangGraph, CrewAI, AutoGen, or similar — planning, memory, tool usage, autonomous task execution
Structured extraction: LLM-driven extraction from unstructured documents into validated schemas, with confidence scoring and human-review routing for low-confidence outputs
Self-hosted stacks: Ollama, Open WebUI, or other on-prem/self-hosted AI deployments where data can't leave the building
Fine-tuning: Experience with model fine-tuning, LoRA / PEFT, or quantization
NLP / ML background: Prior work in NLP, search engines, or recommendation systems
Frontend: Comfort building the UI yourself (React, Next.js, or similar) so the chat experience is clean end to end
Domain constraints: Experience navigating technical constraints in high-scale APIs, Enterprise SaaS, Fintech, or Healthcare
Cost optimization: Demonstrated AI cost-optimization strategies in production
Async architecture: Event-driven services and asynchronous processing pipelines for ingestion and long-running tasks
🔷 Project Scope
Build a production-grade, AI-powered document Q&A system: ingest our internal corpus, implement semantic search and contextual Q&A with citations, and ship a chat interface our team actually uses to resolve questions faster.
Deliverables
Document ingestion pipeline (structure-preserving, OCR-capable, incremental)
Knowledge-base indexing system
RAG retrieval layer with source citations and hallucination guardrails
Chat web interface with streaming and authentication
Evaluation set + accuracy and latency benchmarks
Monitoring, logging, and cost-tracking
Deployment and full technical documentation
What We Offer
A real, scoped project with a clear outcome — not an open-ended research exercise
Direct communication with decision-makers, fast approvals, no committee
Strong potential for ongoing work: maintenance, corpus expansion, and new internal AI tools once this ships
A team that values measurable results and clean documentation over hype
To Apply, Please Include
Describe your recent experience with similar RAG or document-chatbot projects. Share 1–2 concrete examples (link, repo, or short write-up).
Your preferred tech stack for this project, and why you'd choose it.
Project type: One-time project with support needed. Potential long term collaboration.
Open job