AI/ML Engineer - LLM Orchestration (Long-term, Multi-Vertical AI Platforms) $600-$1000/mth Full Time
Rozpočet: $600.0
FIXED /
⭐ 5.00 (7)
Australia
artificial-intelligence, machine-learning
---
**About us**
We are an Australian AI automation business. We build vertical AI platforms for regulated industries - solutions where generic horizontal AI tools (ChatGPT, Claude, Gemini) cannot compete because the workflow is too specific, the compliance burden is too high, and the data layer is too proprietary.
We have multiple vertical platforms in build and pilot. Each one is a multi-engine AI orchestration layer with industry-specific surfaces on top. Each one trains on the customer's own data so the longer they use it, the more useless any competitor becomes.
Your work will span multiple verticals over time. The first project on your plate will be one of our active builds. Future projects will include other verticals as they come online. You are not being hired for one product. You are being hired for the AI orchestration capability across the company.
**The role**
We have a working product team: 3 full-time full-stack engineers (Next.js, React, Postgres, 2 years of workflow-AI experience), a product lead, and design partners across verticals. We want to add a full-time AI/ML specialist who owns the engine orchestration layer, sets the architectural pattern, and ensures production-grade quality across every vertical AI engine we ship.
Each vertical has between 6 and 12 distinct AI engines. Each engine is a learning system that trains on the customer's specific data over time. Examples of engine types you will build (across various verticals): conversion prediction, intent scoring, churn prediction, opportunity mining, performance pattern intelligence, cross-matching, lifecycle stage classification, negotiation intelligence, campaign intelligence, root-cause analysis.
These engines are imperative. They are not features. A competitor running the same algorithm on different data learns different patterns. There is no shared training pool that levels the field. Your job is to make sure these engines are built right - not just functional.
**What you will build**
Your job is to architect and ship vertical AI engines for our active builds. Each vertical has 6-12 engines, each engine learns on the customer's specific data, and each engine needs to be built right - not just functional.
Your first major delivery is one representative engine end-to-end as the architectural template the rest of the team follows. This includes:
- Data ingestion pipeline from the customer's source systems
- LLM orchestration using Claude and/or OpenAI with structured outputs
- Evaluation harness against labeled historical outcomes
- Cost model and observability (per-prediction cost, total monthly cost at scale)
- Prompt versioning, drift detection, and quality monitoring in production
- Architecture documentation the full-stack team replicates from
After the architectural template is in place, you move into ongoing engine development across verticals. You design the engines, pair on the hard ones, and review the rest as the team builds. Realistic pace once the pattern is set: roughly one engine every 3-4 weeks across the team.
This is full-time long-term employment, not a fixed-scope contract.
**The stack**
- LLMs: Anthropic Claude (primary), OpenAI GPT (secondary). Gemini and open-source models only if specifically justified.
- Orchestration: Tool use / function calling, structured outputs via Zod or Pydantic schemas. LangChain and LangGraph optional; we prefer direct API calls unless orchestration framework adds clear value.
- Retrieval: Postgres with pgvector as default. Pinecone or Qdrant only if scale justifies.
- Backend: Python (FastAPI) or Node.js. We are open to either.
- Frontend (context only): Next.js 14+ App Router, TypeScript strict, Tailwind, shadcn/ui. Your engine outputs will be surfaced by the full-stack team.
- Database: Postgres (Supabase-hosted). Multi-tenant with row-level security.
- Evaluation: Custom eval harness or Braintrust / Langfuse / similar. You will own this.
- Deployment: AWS or Vercel Functions. Managed infrastructure (Supabase, Inngest for queues, Clerk for auth).
**Hard requirements**
Do not apply unless all five of these are true. Applications missing any of these will be rejected.
1. You have shipped LLM-orchestrated systems to production in the last 18 months. Not prototypes. Not internal tools. Not hackathon projects. Production systems serving real users at scale. You can name the project, describe the architecture, and explain what broke and how you fixed it.
2. You think in RAG + structured outputs + evaluation, not fine-tuning. You know when fine-tuning is actually the right answer (rarely) and when it is the wrong answer (usually). You default to prompt engineering, structured outputs, and retrieval augmentation for 95 percent of problems.
3. You have set up evaluation harnesses for LLM outputs in production. You can describe how you built labeled datasets, measured accuracy / hallucination / drift, and made improvement decisions based on eval results. You understand why evals matter more than benchmarks.
4. You have managed LLM cost at scale. You know how to route between models for cost (Haiku for cheap, Sonnet for standard, Opus for hard). You have used prompt caching, batch APIs, and retrieval to cut token costs. You think in dollars-per-thousand-predictions, not just accuracy.
5. You can lead, not just execute. You can design an architectural pattern, document it, and have 3 mid-level engineers replicate it under your review. You welcome code review as a gating mechanism, not a nuisance.
**Nice to have**
- Experience with voice pipelines (Whisper, Deepgram, AssemblyAI) - several of our verticals use voice capture as a primary data feed
- Multi-tenant SaaS background (per-customer data isolation, row-level security)
- Vertical AI experience (legal, healthcare, real estate, fintech, professional services)
- Strong written English - you will document patterns the team builds from
- Comfort working across multiple products simultaneously, not just one codebase
**Disqualifiers**
Do not apply:
- Available for less than 25 hours per week, or cannot commit to a 6-month engagement
- Basic Full Stack with no experience in prompting/llm/AI work
- Generalist who has used ChatGPT for side projects but never shipped an LLM system to paying customers
- Traditional ML / data science background who wants to train custom models from scratch
- Computer vision or NLP-research background with limited production deployment experience (ie we need technical dev experience)
**Communication and logistics**
- Primary tools: GitHub, Linear, Slack, Notion, Cursor
- Working hours: any timezone, but 2-4 hours overlap with AEDT (Sydney) required for pairing and standups
- Reporting: directly to the CEO during the first 30 days, then to the product lead and tech lead once the team is coordinated
- Meetings: 2 per week maximum (weekly architecture review, weekly pairing session)
**How to apply**
Your application should answer these four questions. Copy-paste cover letters that do not address these will be rejected immediately.
- Question 1: Name one production LLM-orchestrated system you shipped in the last 18 months. What it does, what scale it runs at, and one specific bug or scaling challenge you debugged.
- Question 2: I want to build a system that scores every customer in a CRM for purchase intent (0-10 score) based on their engagement patterns (event attendance, enquiry cadence, email opens, voice note mentions). Give me your 90-second architectural answer: how would you build it?
- Question 3: Describe the evaluation harness you would build for the customer intent scoring engine above. How do you know if it is accurate? How do you catch hallucinations and silently degraded outputs?
- Question 4: Your minimum hours per week you can commit, and your earliest start date.
Applications that quote a rate without answering questions 1-3 will be rejected. We are hiring on depth of thinking, not polish of profile.
**What we offer**
- Long-term employment
- Variety - work across multiple vertical AI products as they ship
- Reporting directly to the CEO and Tech Lead during the first 30 days
- Ownership of the most technically interesting layer of every product (the engines, the moat)
- A team that respects engineering judgment and has moved past the 'CRUD with AI bolted on' conversation
- This is a Full time Role, Monthly Starting Salary of $600-$1000 per month Full Time hours. Then salary will be moving up in 6 months depending on performance.
Otevřít na Upwork