AI/LLM Engineer — Reasoning Layer over Retrieval (RAG)

Buget: $15.0 - $25.0 HOURLY / FULL_TIME ⭐ 4.48 (172) United States

ruby-on-rails, postgresql, python, artificial-intelligence, computer-vision, node.js, natural-language-processing

We're building a production AI system and looking for an engineer who is genuinely strong in modern LLM reasoning architecture. To be clear about what this is not: this is not a search project, and not a "wrap an API and ship a chatbot" project. Plain vector search returns documents — it doesn't reason. We're building a reasoning layer that sits on top of retrieval: a system that takes a user's question, works out what's actually being asked, pulls the right grounding from a knowledge base, and produces an answer that reflects real multi-step reasoning over that material — not a nearest-neighbor lookup with an LLM stapled on top. That distinction is the whole job. The retrieval layer is necessary infrastructure, and you need to understand it deeply — but it's the floor, not the product. Ours is already hybrid (vector + BM25 + reciprocal rank fusion) with a reranker, and we move by measuring: every change runs through an eval harness, before and after. The hard and interesting work is everything above that floor. What you'd actually work on - Interpreting intent — figuring out what the technician is really asking before retrieving anything. - Context assembly and budgeting — deciding what the model actually needs and packing it intelligently, not stuffing the window. - Multi-turn reasoning and grounded, well-cited answers that are meaningfully better than raw-LLM or naive-RAG output. - Multimodal ingestion of messy source material (OEM technical manuals): tables and figures parsed into structured, retrievable chunks and cross-referenced so they don't become orphaned context. - A flagship R&D thread: reading wiring and schematic diagrams — pulling out the symbols and the routes and connections between them, and turning a drawing into something a reasoning system can actually use. It's an open, genuinely hard problem at the intersection of vision and reasoning, and a real part of this role. - Debugging the full failure surface — from "the question was misunderstood" to "the right grounding was retrieved but the answer still missed." If you've ever debugged why a reasoning pipeline gave a shallow answer despite having the right material in context, you'll recognize this work immediately. Stack & languages — pragmatic, not dogmatic We work across multiple languages and services and reach for whatever fits the job rather than forcing one stack. We care far more about your LLM-systems depth than your primary language — strong engineers who've built real reasoning systems are exactly who we want to hear from, whatever you write them in. Be comfortable dropping into an unfamiliar codebase, bring the right tool, and be pragmatic. Who we're looking for - You've built this kind of reasoning-over-retrieval system before and can explain why it produces what it produces. - You understand the retrieval floor (embeddings, hybrid search, rerankers) deeply enough to know its limits. - You measure before and after — instinctively, not because you were told to. - You scope realistically and communicate clearly in writing. We work async-first.

Deschide pe Upwork

AI proposal draft

Generate a short cover letter for this job. Edit before sending.

Autentificare