AI/ML Engineer for document intelligence ( RAG, Chatbot, Agent orchestration)

Budget: $25.0 - $90.0 HOURLY / FULL_TIME ⭐ 5.00 (6) PRT

artificial-intelligence, machine-learning, python, chatbot-development, orchestration

We're building an internal AI assistant that lets internal users ask natural-language questions over mixed document corpus — PDFs, SOPs, training material, internal wikis, and a long tail of mixed-quality files — and get back accurate, source-cited answers matched cross domains/sections. This is a senior role. We're looking for someone who has shipped production RAG systems before — not just wired up a demo — and who understands that the difficulty isn't the LLM call. It's ingestion quality, retrieval precision, hallucination control, and having an evaluation loop that turns "it feels better" into a measurable number. If you've built this kind of system end to end and have war stories about chunk boundaries, OCR garbage, and retrieval that looked great in the demo and fell apart on real questions, you're who we want. You'll own three connected pieces — a document ingestion agent, a retrieval pipeline, and a chat client facing interface — from architecture through deployment. 🔧 Key Responsibilities RAG & Search Architecture Design, implement, and optimize Retrieval-Augmented Generation (RAG) pipelines and vector embedding workflows over our internal document corpus Build semantic search and contextual Q&A that returns accurate, source-cited answers rather than plausible-sounding guesses Implement hybrid retrieval (keyword + dense vector) with a reranking stage, tuned for precision on real internal questions Manage chunking strategy, embedding selection, and index design — benchmarked against our actual content, not assumed defaults Document Ingestion & Data Pipelines Build a document ingestion agent that handles structured and unstructured inputs: clean PDFs, scanned files, office documents, SOPs, and mixed-quality sources Implement structure-preserving chunking, OCR fallback for image-based documents, and metadata extraction (source, section, page) for precise citations Support incremental re-indexing so newly uploaded documents are searchable without full rebuilds Where relevant, add structured extraction from messy documents into validated schemas with per-field confidence scoring LLM Integration & Orchestration Design and implement LLM-powered application logic using APIs (OpenAI, Anthropic/Claude, Gemini, or open-source alternatives) for our specific business use cases Connect and orchestrate models behind a clean abstraction so the underlying LLM is swappable Build prompt chains and tool-routing logic; where it adds value, implement AI agents that route across document RAG, SQL, and metadata sources Add guardrails to mitigate hallucinations and handle "I don't know" gracefully System Optimization Proactively improve system latency, minimize API and token costs, and maximize response accuracy in live use (caching, batching, token-usage discipline) Make and document fundamental trade-offs: cost vs. latency vs. precision/recall Evaluation & Testing Build and maintain a rigorous evaluation framework: a labelled golden question set with measurable retrieval-precision and answer-accuracy scores Track performance benchmarks over time so improvements are numbers, not impressions Continuously assess retrieval quality and regression as the corpus grows Chat Application & Backend Develop backend services in Python (FastAPI / Flask) with secure, high-performance REST APIs Build a responsive, user-friendly chat interface for non-technical internal users Implement streaming responses (WebSockets / SSE), multi-turn conversation memory, and inline source references on every answer Design and maintain databases, an authentication system, and per-user access Deployment, Scalability & Observability Package and deploy scalable AI services to our chosen cloud (AWS / GCP / Azure) or VPS using Docker Ensure scalability, observability, and performance monitoring — structured logging, error tracking, latency and cost dashboards from day one Deliver complete technical documentation so the system is maintainable after handoff 🔷 Required Skills (Must-Have) Software engineering depth: Several years of professional experience with a proven track record of shipping AI/ML solutions to production, not just prototypes Python backend engineering: Deep, hands-on Python and modern backend architecture; production experience with FastAPI or Flask Production RAG: Hands-on experience building production-grade RAG systems — embeddings, chunking, retrieval tuning, reranking, citation grounding Vector databases: Practical experience managing vector databases (Pinecone, Qdrant, Weaviate, FAISS, or Chroma) and the judgment to select based on scale, ops, and data-isolation needs LLMs: Hands-on experience with LLMs (GPT, Claude, Gemini, or open-source) and practical expertise in prompt engineering and LLM orchestration Core fundamentals: Solid understanding of embeddings, tokenization, semantic search, and retrieval mechanics Orchestration frameworks: Working knowledge of LangChain, LlamaIndex, Haystack, or a custom agentic framework APIs & data: Strong experience with REST APIs, JSON-based systems, and both structured and unstructured data pipelines Cloud & DevOps: Familiarity with cloud platforms (AWS / GCP / Azure), Docker, and deployment automation ML metrics: Comfort with fundamental ML metrics and trade-offs — cost, latency, precision/recall — and the ability to reason about them out loud Streaming: Knowledge of real-time streaming (WebSockets, SSE) Version control: Fluent with Git and modern version-control workflows Communication & reliability: Excellent, structured English; comfortable explaining architectural choices and leading technical discussions; high responsiveness and steady availability during agreed working hours 🔷 Good to Have (Bonus Skills) Agentic frameworks: Experience building multi-agent or multi-tool systems with LangGraph, CrewAI, AutoGen, or similar — planning, memory, tool usage, autonomous task execution Structured extraction: LLM-driven extraction from unstructured documents into validated schemas, with confidence scoring and human-review routing for low-confidence outputs Self-hosted stacks: Ollama, Open WebUI, or other on-prem/self-hosted AI deployments where data can't leave the building Fine-tuning: Experience with model fine-tuning, LoRA / PEFT, or quantization NLP / ML background: Prior work in NLP, search engines, or recommendation systems Frontend: Comfort building the UI yourself (React, Next.js, or similar) so the chat experience is clean end to end Domain constraints: Experience navigating technical constraints in high-scale APIs, Enterprise SaaS, Fintech, or Healthcare Cost optimization: Demonstrated AI cost-optimization strategies in production Async architecture: Event-driven services and asynchronous processing pipelines for ingestion and long-running tasks 🔷 Project Scope Build a production-grade, AI-powered document Q&A system: ingest our internal corpus, implement semantic search and contextual Q&A with citations, and ship a chat interface our team actually uses to resolve questions faster. Deliverables Document ingestion pipeline (structure-preserving, OCR-capable, incremental) Knowledge-base indexing system RAG retrieval layer with source citations and hallucination guardrails Chat web interface with streaming and authentication Evaluation set + accuracy and latency benchmarks Monitoring, logging, and cost-tracking Deployment and full technical documentation What We Offer A real, scoped project with a clear outcome — not an open-ended research exercise Direct communication with decision-makers, fast approvals, no committee Strong potential for ongoing work: maintenance, corpus expansion, and new internal AI tools once this ships A team that values measurable results and clean documentation over hype To Apply, Please Include Describe your recent experience with similar RAG or document-chatbot projects. Share 1–2 concrete examples (link, repo, or short write-up). Your preferred tech stack for this project, and why you'd choose it. Project type: One-time project with support needed. Potential long term collaboration.

Open job