Build Hallucination-Free Clinical RAG with Adversarial Multi-Agent Verification (LangGraph)

Bütçe: $500.0 FIXED / ⭐ 0.00 (0) Egypt

Project Overview We are building a next-generation Clinical Decision Support System that goes far beyond basic document Q&A. Unlike standard chatbots that retrieve a single chunk and generate a reply, this system must behave like a medical review panel – it must draft recommendations, actively attack them with contradictory evidence, and produce a final, citation-forced verdict. I need an elite-level AI Engineer who understands that real-world healthcare AI is not about prompt engineering, it is about system architecture, stateful agentic workflows, and hallucination mitigation through adversarial verification. The Core Problem to Solve Current RAG systems hallucinate when faced with conflicting medical literature. A study from 2020 says Drug X works, a 2025 study says it fails in elderly patients. A naive RAG picks one at random. We need a Hierarchical Agentic System that: 1. Identifies contradictions before answering. 2. Weighs evidence based on study power, recency, and design. 3. Never outputs a claim without directly citing the source chunk that supports it. Advanced Technical Features Required (The "How") This is not a beginner project. The successful candidate will architect and implement the following production-grade pipeline: 1. Semantic Document Ingestion & Multi-Layer Indexing - Ingest 3 distinct document layers: Clinical Trial PDFs, Real-World Case Reports, and Local Hospital Guidelines. - Implement Semantic Chunking (splitting by medical headings like "Methodology" and "Results") rather than fixed-character splitting. 2. Hybrid Retrieval with Cross-Encoder Re-ranking - Build a hybrid retriever combining BM25 (keyword) + Dense Vector Search (semantic) to ensure no critical medical term is missed. - Integrate a Cross-Encoder Re-ranker to squeeze the top 20 retrieved chunks down to the top 5 most relevant per layer. This drastically reduces noise in the LLM context window. 3. Stateful Multi-Agent Orchestration (LangGraph) You will build a cyclic, stateful workflow with 3 specialized agents that communicate via a shared message state: - Agent 1 (The Drafter): Constructs an initial medical recommendation with inline citations using the top retrieved evidence. - Agent 2 (The Devil's Advocate / Critic): Armed with a Web Search Tool (Tavily/PubMed API) and an internal vector store, this agent actively hunts for papers, case studies, or data that directly contradict Agent 1's draft. - Agent 3 (The Adjudicator / Judge): This agent evaluates both arguments against strict heuristics – recency (preferring 2025 over 2010), sample size (n=1000 vs n=20), and study design (double-blind RCT over retrospective case reports). It writes the final, authoritative verdict. 4. Chain-of-Verification - Before delivering the final output, implement a strict validation guardrail. The system must re-query the original retrieved chunks for every cited claim in the final draft. If a claim lacks a supporting source, the system forces a rewrite on the spot. Zero hallucinations permitted. 5. Production Tooling - Integrate tool-calling (e.g., a calculator for dosing, a date parser for timeline analysis) so the agents can perform complex reasoning steps autonomously. Tech Stack Required You must be highly proficient with: - LangGraph (Core – for cyclic agent handoffs and state management). - LangChain / LangChain-Community (Prompt templating & output parsers). - ChromaDB / Weaviate (Vector store). - Sentence-Transformers (Embeddings) + Rank-BM25 (Keyword) + Cross-Encoder (Re-ranking). - Unstructured + PyPDF (Clean PDF extraction). - Tavily API or BeautifulSoup + Requests (for the Critic agent's web search tool). - Pydantic (For strict output structuring of agent responses). Project Deliverables (What You Get) - A fully functional Python repository with a README explaining how to run the pipeline. - A performance report measuring retrieval hit-rate and hallucination reduction on a provided test dataset. - Clean, modular code with proper error-handling (not a spaghetti Jupyter notebook). Who We Are Looking For (Your Value Proposition) To succeed in this project, you must check all these boxes: - You do not treat LLMs as magic boxes. You treat them as fragile reasoning engines that need strict systems engineering to function safely. - You have deep, proven experience with LangGraph's state management. You can show me how you've managed complex loops, conditional edges, and inter-agent handoffs in past work. - You understand healthcare constraints. You prioritize deterministic guardrails (Cross-Encoder re-ranking, study-power heuristics) over random prompt tinkering because you know that in this industry, a hallucination costs a life, not just a bad review. - You are an API-first developer. You know how to build robust FastAPI services with background workers, and you can structure code so that our frontend team can integrate effortlessly. - You provide architectural diagrams and design docs before writing a single line of production code. You value transparency and daily progress logs. How to Apply If you are the elite engineer we are looking for, please send a proposal that includes: 1. A brief description of your experience with LangGraph-based multi-agent systems. 2. Your approach to handling long-running async agentic loops in a production API. 3. A rough high-level architecture sketch (just 1 paragraph) for how you would structure the "Drafter → Critic → Adjudicator" flow.

Upwork'te aç