Real-Time Voice Conversion & Low-Latency Streaming

Budget: - HOURLY / PART_TIME ⭐ 4.63 (37) Morocco

audio-engineering, audio-effects

## Overview We are looking for a highly skilled AI Audio Engineer to take full ownership of the design, implementation, and deployment of a production-grade real-time voice conversion system. The objective is to enable a speaker's voice to be transformed in real time into a target voice during professional online meetings, while maintaining natural conversation flow, low latency, and high reliability. This is not a research project. The goal is to build a robust solution that can be used regularly in production environments. The final system should: - Convert Speaker A's voice into Voice X in real time. - Work at the operating system level on Windows, using the computer's microphone and audio devices. (Operate under Teams, google meet, Virtual Machines, and anywhere). - Be reliable enough for repeated professional use. - Maintain natural conversation quality. - Keep latency below approximately 300 ms whenever possible. - Require minimal intervention from the end user. ## Existing Assets * Several hours of high-quality recordings of the target voice are available. * The target language is English. * The primary environment is Windows. * One user will operate the system. ## Scope of Work The selected freelancer will be expected to: ### Architecture & Technical Design * Evaluate existing voice cloning and voice conversion technologies. * Recommend the most suitable architecture. * Identify technical risks, limitations, and mitigation strategies. * Define the fastest path toward a production-ready solution. ### Prototype Development * Build a proof of concept demonstrating real-time voice conversion. * Measure latency, stability, and voice quality. * Test compatibility with Microsoft Teams inside a Virtual Machine ### Production Implementation * Improve reliability and audio quality. * Optimize latency. * Implement all required audio routing and virtual device configurations. * Deliver a solution suitable for regular professional usage. ### Troubleshooting & Ownership We are specifically looking for someone who enjoys solving difficult technical problems and taking ownership. The ideal candidate should be comfortable dealing with: * Audio routing challenges * Windows audio stack issues * Virtual microphone configurations * Real-time streaming constraints * Latency optimization * Voice model tuning * Unexpected production issues The expectation is not simply to write code or find the right technology, but to make the project successful regardless of obstacles encountered along the way. ## Required Experience Strong experience with some of the following: * Real-time voice conversion * Voice cloning * Speech-to-speech AI systems * Audio DSP * WebRTC * Low-latency streaming systems * Windows audio systems * Python (if required) * Machine Learning (if required) * LLM and speech technologies Experience with production-grade audio applications is highly preferred. ## Deliverables * Working prototype * Production-ready implementation * Installation and deployment instructions * Technical documentation * Risk assessment and recommendations * Ongoing support during stabilization phase ## Ideal Profile We are looking for a builder and troubleshooter rather than a researcher. Someone who: * Takes full ownership. * Is highly pragmatic. * Finds solutions instead of blockers. * Can challenge assumptions and provide honest recommendations. * Is comfortable making difficult projects work in real-world conditions.

Apri su Upwork

AI proposal draft

Generate a short cover letter for this job. Edit before sending.

Accedi