AI Developer — Build a Whisper-Based Subtitle Extraction Tool from Scratch (Python/Flask)
Budget: $30.0
FIXED /
⭐ 0.00 (0)
South Korea
python, api, machine-learning, flask, artificial-intelligence, python-script, node.js
We need an AI-powered subtitle extraction tool built from the ground up — a tool that takes any video file and automatically generates accurate subtitles in the video's original spoken language, using local AI speech recognition (no cloud APIs, no translation).
What it should do:
Accept video/audio file uploads through a simple web UI
Extract audio and clean it up (remove silence/noise) before transcription
Use an AI speech-to-text model (Whisper) to transcribe speech — automatically detecting the spoken language (English, Korean, Russian, Turkish, etc. — any language)
Merge raw transcription chunks into natural, well-timed subtitle sentences
Output a clean, properly formatted .srt file, downloadable from the browser
Support queued processing — multiple videos can be uploaded and processed one after another in the background
Allow canceling a queued job before it starts
Tech we expect you to use:
Whisper (AI speech recognition model) — for transcription, ideally via mlx-whisper for Apple Silicon performance, or openai-whisper / faster-whisper as alternatives
Voice Activity Detection (AI model) — e.g. Silero VAD — to strip silence before transcription for better accuracy
Python backend (Flask or similar) with a background job queue
ffmpeg for audio extraction
Simple, clean web frontend (HTML/CSS/JS) — no framework required
Why this is an AI project:
This isn't just file handling — the core value is two AI models working together (speech detection + speech-to-text transcription) to turn raw audio into readable, well-timed subtitles automatically, without any manual transcription or paid translation API.
Requirements:
Proven experience with Whisper or similar ASR (Automatic Speech Recognition) models
Comfortable with audio preprocessing (ffmpeg, sample rates, normalization)
Python backend experience (Flask/FastAPI)
Bonus: experience with MLX (Apple Silicon ML framework) or CUDA-accelerated inference
Deliverable: Fully working app, source code, brief documentation on setup/running it locally.
Auf Upwork öffnen