Claude Code & Prompt Engineering Expert: multi-agent AI systems

Бюджет: $40.0 - $70.0 HOURLY / FULL_TIME ⭐ 4.96 (66) Japan

api-integration, python

We're looking for a senior AI Agent Engineer to help improve the quality and reliability of a production multi-agent AI platform. Our agents do real marketing work for real clients every day, including research, creative development, social media, paid media analysis, and presentations. The platform runs on GPT-5 class models, while we build and orchestrate our agent workflows in Claude Code using skills, subagents, MCP servers, hooks, and deterministic gates. This is not a role for someone who has only experimented with prompts. We need someone who understands how production agent systems behave, knows how to debug them, and takes ownership of output quality. What you'll do • Design, refine, and optimize prompts so agents behave consistently, choose the right tools, follow workflows, respect gates, and never expose internal implementation details. • Build and improve agent workflows in Claude Code using skills, subagents, MCP servers, hooks, and gates. • Debug agent behavior by reading logs and execution traces, identifying root causes, and fixing issues at the prompt, tool, workflow, or gate level. • Build evaluation suites and regression tests so improvements are measurable and changes don't introduce regressions. • Integrate new tools and data sources through MCP and optimize how agents decide when and how to use them. • Continuously improve reliability, efficiency, and output quality across the system. What we're looking for • Extensive experience building production systems in Claude Code. • Strong prompt engineering skills with a deep understanding of tool use, multi-step workflows, context management, and agent orchestration. • Solid knowledge of current Claude and GPT model families, including their strengths, limitations, and when to use each. • Strong judgment. You evaluate the quality of the final output, not just whether the prompt was followed. • Comfortable investigating logs, execution traces, and SQL databases such as Postgres or Supabase. • Python for scripting, automation, evaluations, and integration work. • Self-directed, proactive, and comfortable owning problems from diagnosis through validation. Nice to have • Experience in marketing, advertising, or creative AI workflows. • Familiarity with evaluation frameworks such as OpenAI Evals or similar. • Experience with image or video generation pipelines. How we work We're an async-first team that moves quickly. Some overlap with UK working hours is helpful. We use Linear for project management and GitHub for development. Clear, straightforward communication matters more than buzzwords. To apply Please include brief answers to the following: 1. Tell us about a production agentic system you built or improved in Claude Code using skills, MCP, subagents, or gates. What made it reliable? 2. Describe a time you diagnosed a misbehaving LLM agent using logs or execution traces. How did you identify the root cause, and how did you verify your fix? 3. How do you decide which AI model and settings to use for different tasks? How do you design gates that prevent agents from making incorrect decisions or producing unsafe outputs? A short Loom video or a concise written response is perfectly fine. We're interested in how you think and solve problems.

Открыть заказ