AI Proxy Layer Development

Budget: - HOURLY / PART_TIME ⭐ 0.00 (0) USA

deep-learning, python, artificial-intelligence, api

I’m building a proxy layer that sits in front of AI agents and cuts their inference cost 40–70% — automatically, without the agent changing its logic. The project involves complex technical tasks, requiring experience in AI and cost optimization. The ideal candidate will have a strong understanding of AI systems and be able to develop efficient solutions. This is NOT an agent-building job. I’m not looking for agent workflows, memory stores, or MCP orchestration. I need the layer underneath that: a high-throughput proxy that normalizes traffic across providers, caches intelligently, routes each call to the cheapest model that holds quality, and proves it. The deliverable I care about most: take one real agentic workload, instrument its current cost, apply the optimization stack, and produce a measured before/after — “cut compute N% with quality held, here’s the eval.” That number is the goal, not a polished UI. You’re a fit if you’ve worked on / contributed to: • vLLM, SGLang, LiteLLM, Portkey, or similar serving/gateway infrastructure • LLM serving internals: KV/prefix caching, continuous batching, quantization, model routing • Provider API normalization and tool-call handling across models • Eval design for LLM output quality You’re not a fit if your background is agent frameworks, RAG apps, or “I use the OpenAI API.” This role is about what happens below the API call. To apply — required, or I won’t read it: 1. Link a specific piece of your work (GitHub PR, repo, project) that shows serving/gateway/caching/routing work. Not a portfolio site — the actual code. 2. In two sentences: how would you handle routing a tool-calling request from an OpenAI-shaped agent to an open-weight model whose tool-call format differs? 3. Skip the generic intro. Lead with #1 and #2.

Open job