← İşler

AI Expert for Voice Bot Latency Reduction

Bütçe: $10.0 - $15.0 HOURLY / PART_TIME ⭐ 4.84 (12) India

twilio-api, voice-acting, english, voice-over

Voice AI Engineer — LiveKit Agent Latency Tuning (Hindi/Hinglish) We're building Naysha, a real-time Hindi/Hinglish voice AI diet coach on LiveKit Agents (Python) + LiveKit Cloud. The pipeline works end-to-end; we need an expert to cut response latency to ~2s and harden turn-taking. Tech stack LiveKit Agents (Python SDK), LiveKit Cloud (explicit agent dispatch) STT: Google Cloud Speech v2 — chirp_3, hi-IN, asia-south1 (final-only, no interims) Turn detection: LiveKit MultilingualModel (eou) + Silero VAD LLM: custom streaming backend (SSE, /respond) — domain-tuned, not swappable TTS: Google Chirp3 HD (streaming) Redis (session state), httpx; web + telephony frontends Current state — measured per-turn latency (speech-end → first audio): Stage Time STT (chirp_3, no interims) 1.3–2.6s Turn detection (eou) 0.1–0.4s Connection ~0.5s Backend LLM TTFT 1.7–3.7s TTS first audio ~0.8s Total ~4–6s Done: fillers, instant intro (1.5s), eou cold-start prewarm, TTS underrun fix, reconnect logic, connection reuse. Where we need help (to hit ~2s): Interim/streaming STT for Hindi — chirp_3 has no interims; long/chirp_2 either fail in the LiveKit pipeline or regress accuracy. Need a fast, accurate streaming-STT path for Hindi/Hinglish (incl. food-term adaptation). Overlap the pipeline — start LLM/backend on partial transcripts; shave turn-commit (eou) latency without cutting users off. Barge-in / acoustic echo robustness. Deliverable: reliable ~2s perceived response on real Hindi/Hinglish calls, with accuracy intact. Must have: hands-on LiveKit Agents + Google STT v2 + real-time voice latency experience.
Upwork'te aç