Senior Engineer / AI Agent Orchestrator

Budget: $10.0 - $12.0 HOURLY / FULL_TIME ⭐ 4.95 (31) United States

artificial-intelligence, python, machine-learning, natural-language-processing, artificial-neural-networks, data-science

You won't be writing the code by hand. You'll be directing AI coding agents to build it — and holding their output to a production bar. We're a small, focused team building a deterministic, cryptographically-verifiable incident-attribution platform — it turns production incidents into signed, append-only compliance evidence (patent-pending). The product is live, written in TypeScript, and moving fast. We're changing how we build. Instead of hiring someone to type code, we want a senior engineer who can make Claude Code and Codex run the show: plan the work, break it into agent-runnable tasks, write the specs and prompts that produce correct results, run the agents, and rigorously review and verify everything they ship. You are the architect, the operator, and the quality gate. The agents are the hands. This is a real engineering role, not prompt dabbling. You have to be senior enough to read and judge code expertly, catch subtle bugs and security issues, keep the architecture coherent, and know when the agent is wrong — because in our domain (signed, deterministic, compliance-grade evidence) "looks done" is not done. WHAT YOU'LL DO Own delivery of features and fixes end-to-end by directing coding agents (Claude Code, Codex) — not by hand-coding. Turn loose requirements into precise specs, task breakdowns, and prompts the agents can execute reliably — recon first, small diffs, verify by artifact. Review every diff the agents produce with a senior eye: correctness, security, performance, and fit with the existing codebase. Reject slop. Protect our non-negotiable invariants — including an append-only, hash-chained, cryptographically-signed ledger that must never be corrupted or polluted with test data. Improve the agent workflow itself: prompt/spec templates, guardrails, evals and verification checks, MCP tooling, and CI gates, so the agents get more reliable over time. Run the loop tightly — plan → prompt → run → review → verify → ship — with clear, frequent communication. YOU'RE A STRONG FIT IF YOU Are a senior/staff-level engineer who could build this by hand — but would rather orchestrate agents to do it at higher leverage. Have real, demonstrable experience shipping PRODUCTION code primarily through agentic tools (Claude Code, Codex, Cursor agents, Aider, or similar) — not just autocomplete. Are an excellent code reviewer — you catch what the agent missed and can explain exactly why it matters. Know our stack well enough to direct and judge work in it: TypeScript monorepo (pnpm / Turbo), Next.js + React, tRPC, Drizzle ORM + PostgreSQL / Supabase, deployed on Vercel. Have a security- and correctness-first mindset — we handle cryptographic signing and compliance evidence, so mistakes have consequences. Write clearly. Half this job is precise written specs and prompts. NICE TO HAVE MCP, agent evaluation/verification harnesses, or experience building internal agent tooling. Domain exposure: observability / incident response, compliance (SOC 2, ISO 27001, NYDFS, DORA), cryptographic signing, or fintech-grade systems. Webhook-heavy integration work, Stripe, and Supabase/Postgres at scale. THIS ROLE IS NOT A "write every line yourself" job. If you want to hand-code everything, this isn't for you. A "vibe-code and ship whatever the agent produces" job. Agent output is held to a production bar and reviewed line by line. Junior-friendly. You need the judgment to know when the agent is confidently wrong. HOW TO APPLY (read carefully — I screen on this) Start your proposal with the word DETERMINISTIC so I know you read this. Give me 1–2 concrete examples of production features or systems you shipped primarily by directing coding agents — what you built, which agents, and what your role actually was. In 3–5 sentences, describe your agentic workflow: how you plan, prompt, review, and verify so the output is trustworthy. Tell me about one time an agent was confidently wrong and how you caught it. Generic proposals and AI-written boilerplate will be passed over. I'll begin with a small PAID TRIAL TASK — drive an agent to implement or fix one scoped issue end-to-end, and show me your review and verification notes — before moving to an ongoing engagement.

Apri su Upwork