AI Proxy Layer Development
Budget: -
HOURLY / PART_TIME
⭐ 0.00 (0)
USA
deep-learning, python, artificial-intelligence, api
I’m building a proxy layer that sits in front of AI agents and cuts their inference cost 40–70% — automatically, without the agent changing its logic. The project involves complex technical tasks, requiring experience in AI and cost optimization. The ideal candidate will have a strong understanding of AI systems and be able to develop efficient solutions.
This is NOT an agent-building job. I’m not looking for agent workflows, memory stores, or MCP orchestration. I need the layer underneath that: a high-throughput proxy that normalizes traffic across providers, caches intelligently, routes each call to the cheapest model that holds quality, and proves it.
The deliverable I care about most: take one real agentic workload, instrument its current cost, apply the optimization stack, and produce a measured before/after — “cut compute N% with quality held, here’s the eval.” That number is the goal, not a polished UI.
You’re a fit if you’ve worked on / contributed to:
• vLLM, SGLang, LiteLLM, Portkey, or similar serving/gateway infrastructure
• LLM serving internals: KV/prefix caching, continuous batching, quantization, model routing
• Provider API normalization and tool-call handling across models
• Eval design for LLM output quality
You’re not a fit if your background is agent frameworks, RAG apps, or “I use the OpenAI API.” This role is about what happens below the API call.
To apply — required, or I won’t read it:
1. Link a specific piece of your work (GitHub PR, repo, project) that shows serving/gateway/caching/routing work. Not a portfolio site — the actual code.
2. In two sentences: how would you handle routing a tool-calling request from an OpenAI-shaped agent to an open-weight model whose tool-call format differs?
3. Skip the generic intro. Lead with #1 and #2.
Open job