AI Data Agent Engineer — Agentic Analytics Platform (Semantic Layer · NL2SQL · Evals · MCP)

Budget: $70.0 - $150.0 HOURLY / PART_TIME ⭐ 5.00 (12) United States

python, data-science, artificial-intelligence

What we’re building We’re building an agentic analytics platform: an AI data analyst that lets business users ask questions in plain English and get trustworthy answers directly from their live data warehouse. The architecture follows the modern analytics stack: • Modern cloud data warehouse • Governed semantic layer with certified metrics, entity relationships, and business glossary • LLM agent grounded in that semantic layer • Exposed through MCP (Model Context Protocol) This isn’t a demo or research project. The platform is already live with paying customers running production workloads. We’re looking for someone who has already solved this problem in production. Someone who understands the difference between generating SQL and producing numbers a CFO will actually trust. The problem you’ll own Generating SQL with an LLM is relatively easy. Building an AI analyst that returns the same correct, governed answer every time is the hard part. That requires: • A governed semantic layer with certified metric definitions so “Revenue,” “Occupancy,” or any business metric has exactly one meaning. • Context engineering for structured data, including curated catalogs, table and column documentation, retrieval context, and keeping irrelevant information out of the model. • Retrieval that actually works using embeddings and semantic search across schema metadata, business glossaries, and previously verified queries so the agent consistently selects the correct tables and business definitions. • Eval-driven development with golden datasets, reconciliation against trusted customer reports, regression testing, answer quality metrics, freshness validation, coverage checks, and faithfulness evaluation. • Strong data engineering fundamentals including modern ELT practices, incremental pipelines, data contracts, freshness monitoring, schema drift detection, and reliable warehouse modeling. What you’ll do • Design and evolve the semantic and context layer between the warehouse and the LLM, including metric definitions, entity relationships, business glossary, and catalog curation. • Build and improve the retrieval pipeline using embeddings, vector search, and metadata RAG to ground SQL generation. • Own answer quality end to end by building reconciliation frameworks, evaluation suites, regression tests, and root cause analysis whenever a number is incorrect. • Onboard new data sources using modern ELT practices and prepare them for agent use through documentation, cataloging, governance, and quality validation. • Work alongside AI coding agents every day. Much of the platform is built agentically, and you’ll guide, review, and strengthen their output. You’re probably a great fit if you have • Built or operated a production LLM-over-structured-data system such as text-to-SQL, NL2SQL, conversational BI, semantic-layer-powered AI, or a data copilot. You can discuss real production failures and how you solved them. • Experience with several modern agentic analytics technologies, with opinions about all of them: • Databricks Unity Catalog and Genie/AI-BI • Snowflake Cortex Analyst • dbt Semantic Layer • Cube • Looker / LookML • LangChain or LlamaIndex SQL agents • Vector search (pgvector or similar) • MCP servers and tools • Expert SQL and strong Python skills. You can compare legacy reports with AI-generated SQL, identify why numbers differ, and fix the correct layer instead of masking symptoms. • Deep understanding of warehouse architecture, ELT patterns, incremental loading, data contracts, data quality testing, and medallion-style modeling. • An evaluation-first mindset. You believe in golden datasets, regression testing, LLM-as-judge where appropriate, and measuring answer quality instead of guessing. Nice to have • Healthcare or regulated data experience (HIPAA / PHI) • Cloud experience, preferably GCP (Cloud Run, Cloud SQL), although AWS or Azure is also fine • Experience reverse engineering legacy BI systems such as Power BI, DAX, or SSRS to recover business logic What this is NOT • Not a dashboard or BI developer role. The AI agent replaces dashboards. • Not prompt engineering in isolation. The engineering challenge is in the data, semantic, and context layers. • Not a research project. This is production software with paying customers. Engagement • Approximately 20 hours per week to start • Initial 1 to 3 month contract with the opportunity to grow into a long-term or full-time role • Fully remote • At least 3 hours of daily overlap with US Eastern time • You’ll begin in sandbox environments with cloned databases. Production access is earned over time. To apply Please answer these four questions directly: 1. Describe an LLM-over-data system you’ve built or operated in production. What prevented hallucinated numbers, and what was the worst incorrect answer you encountered? 2. How do you decide what belongs in the semantic layer versus the warehouse model versus the prompt or context? Please give a real example. 3. A user asks a question and the agent returns a dramatically inflated number because it selected a plausible but incorrect join path across raw tables. What’s your systemic fix, not just the one-off patch? 4. What’s your hourly rate and current availability? We read every application carefully. Generic proposals and AI-generated cover letters that don’t answer these questions will be declined without review.

Auf Upwork öffnen