Real-Time Voice Conversion & Low-Latency Streaming
Budget: -
HOURLY / PART_TIME
⭐ 4.63 (37)
Morocco
audio-engineering, audio-effects
## Overview
We are looking for a highly skilled AI Audio Engineer to take full ownership of the design, implementation, and deployment of a production-grade real-time voice conversion system.
The objective is to enable a speaker's voice to be transformed in real time into a target voice during professional online meetings, while maintaining natural conversation flow, low latency, and high reliability.
This is not a research project. The goal is to build a robust solution that can be used regularly in production environments.
The final system should:
- Convert Speaker A's voice into Voice X in real time.
- Work at the operating system level on Windows, using the computer's microphone and audio devices. (Operate under Teams, google meet, Virtual Machines, and anywhere).
- Be reliable enough for repeated professional use.
- Maintain natural conversation quality.
- Keep latency below approximately 300 ms whenever possible.
- Require minimal intervention from the end user.
## Existing Assets
* Several hours of high-quality recordings of the target voice are available.
* The target language is English.
* The primary environment is Windows.
* One user will operate the system.
## Scope of Work
The selected freelancer will be expected to:
### Architecture & Technical Design
* Evaluate existing voice cloning and voice conversion technologies.
* Recommend the most suitable architecture.
* Identify technical risks, limitations, and mitigation strategies.
* Define the fastest path toward a production-ready solution.
### Prototype Development
* Build a proof of concept demonstrating real-time voice conversion.
* Measure latency, stability, and voice quality.
* Test compatibility with Microsoft Teams inside a Virtual Machine
### Production Implementation
* Improve reliability and audio quality.
* Optimize latency.
* Implement all required audio routing and virtual device configurations.
* Deliver a solution suitable for regular professional usage.
### Troubleshooting & Ownership
We are specifically looking for someone who enjoys solving difficult technical problems and taking ownership.
The ideal candidate should be comfortable dealing with:
* Audio routing challenges
* Windows audio stack issues
* Virtual microphone configurations
* Real-time streaming constraints
* Latency optimization
* Voice model tuning
* Unexpected production issues
The expectation is not simply to write code or find the right technology, but to make the project successful regardless of obstacles encountered along the way.
## Required Experience
Strong experience with some of the following:
* Real-time voice conversion
* Voice cloning
* Speech-to-speech AI systems
* Audio DSP
* WebRTC
* Low-latency streaming systems
* Windows audio systems
* Python (if required)
* Machine Learning (if required)
* LLM and speech technologies
Experience with production-grade audio applications is highly preferred.
## Deliverables
* Working prototype
* Production-ready implementation
* Installation and deployment instructions
* Technical documentation
* Risk assessment and recommendations
* Ongoing support during stabilization phase
## Ideal Profile
We are looking for a builder and troubleshooter rather than a researcher.
Someone who:
* Takes full ownership.
* Is highly pragmatic.
* Finds solutions instead of blockers.
* Can challenge assumptions and provide honest recommendations.
* Is comfortable making difficult projects work in real-world conditions.
Apri su Upwork