AI Expert for Voice Bot Latency Reduction
Бюджет: $10.0 - $15.0
HOURLY / PART_TIME
⭐ 4.84 (12)
India
twilio-api, voice-acting, english, voice-over
Voice AI Engineer — LiveKit Agent Latency Tuning (Hindi/Hinglish)
We're building Naysha, a real-time Hindi/Hinglish voice AI diet coach on LiveKit Agents (Python) + LiveKit Cloud. The pipeline works end-to-end; we need an expert to cut response latency to ~2s and harden turn-taking.
Tech stack
LiveKit Agents (Python SDK), LiveKit Cloud (explicit agent dispatch)
STT: Google Cloud Speech v2 — chirp_3, hi-IN, asia-south1 (final-only, no interims)
Turn detection: LiveKit MultilingualModel (eou) + Silero VAD
LLM: custom streaming backend (SSE, /respond) — domain-tuned, not swappable
TTS: Google Chirp3 HD (streaming)
Redis (session state), httpx; web + telephony frontends
Current state — measured per-turn latency (speech-end → first audio):
Stage Time
STT (chirp_3, no interims) 1.3–2.6s
Turn detection (eou) 0.1–0.4s
Connection ~0.5s
Backend LLM TTFT 1.7–3.7s
TTS first audio ~0.8s
Total ~4–6s
Done: fillers, instant intro (1.5s), eou cold-start prewarm, TTS underrun fix, reconnect logic, connection reuse.
Where we need help (to hit ~2s):
Interim/streaming STT for Hindi — chirp_3 has no interims; long/chirp_2 either fail in the LiveKit pipeline or regress accuracy. Need a fast, accurate streaming-STT path for Hindi/Hinglish (incl. food-term adaptation).
Overlap the pipeline — start LLM/backend on partial transcripts; shave turn-commit (eou) latency without cutting users off.
Barge-in / acoustic echo robustness.
Deliverable: reliable ~2s perceived response on real Hindi/Hinglish calls, with accuracy intact.
Must have: hands-on LiveKit Agents + Google STT v2 + real-time voice latency experience.
Открыть заказ