AI / Computer Vision Engineer Needed for Video + Audio De-identification and Dataset Preparation
Budget: -
HOURLY / PART_TIME
⭐ 5.00 (35)
United Kingdom
machine-learning, computer-vision
We are a small early-stage startup working on a medical procedure computer vision dataset project. We are looking for an experienced AI / computer vision engineer to help process a first test batch of approximately 100 hours of video and audio material.
The files are expected to be stored in AWS S3, and the initial task is to build or apply a practical pipeline to prepare these files for downstream AI / computer vision model training.
The project involves video and audio from real-world technical procedures. The material may contain sensitive or identifiable information, so the first priority is robust de-identification and media preparation.
We are working with a limited early-stage budget, so we would strongly prefer someone who can make intelligent use of existing open-source tools, libraries and models rather than building everything from scratch or relying on expensive proprietary services.
Initial scope of work:
Access and process large video/audio files from AWS S3
Detect and blur/redact visible identifying information in video frames
Blur or mask faces, names, labels, documents, screens, tattoos, or other identifying visual details where present
Crop or mask areas outside the relevant procedural field where appropriate
Redact, mute or bleep spoken identifiers in audio, such as names, addresses, dates of birth, locations or other personal details
Produce de-identified output files suitable for review and later AI model training
Preserve file organisation and metadata so that each procedure/session remains clearly grouped
Create a repeatable processing pipeline rather than a purely manual workflow
Provide clear logs/reports of what has been processed and any files requiring human review
Additional capabilities:
Audio-video synchronisation where separate audio and video files are provided
Handling multiple camera angles or multiple video files from the same procedure/session
Stitching, trimming or aligning related media files where needed
Using audio transcripts or time markers to create useful labels or annotations
Creating structured metadata or labels that could support future computer vision, robotics or AI model training
Experience with dataset preparation for machine learning pipelines
Experience with secure data handling, cloud storage and controlled access environments
Knowledge of federated learning or secure model-training environments would be useful for a later phase, but is not essential for the first task
Basic analytics/reporting on processing status, file quality, duration, redaction success and review flags
Likely tools / technologies:
We are open to your recommendations, but would expect relevant experience with tools such as:
Python
OpenCV
FFmpeg
Whisper or other speech-to-text tools
OCR/text detection in video frames
Face/object/text detection and blurring
Audio redaction pipelines
AWS S3 or similar cloud storage workflows
Open-source ML/computer vision libraries where appropriate
We are not looking for a general web developer. We are specifically looking for someone with practical experience in computer vision, video processing, audio transcription/redaction and dataset preparation.
Initial paid test project:
The first stage is a paid proof-of-concept using around 50 hours of video/audio. The goal is to see whether you can create a workable pipeline and demonstrate that the files can be de-identified and prepared reliably.
If the test is successful, there is linkely to be a larger ongoing project involving greater volumes of media and more advanced dataset preparation.
Important requirements:
You must be comfortable working with sensitive data under strict confidentiality
You must be willing to sign an NDA if required
You should be able to explain your proposed approach clearly
You should be able to work pragmatically and cost-effectively
You should prioritise open-source and low-cost tooling wherever possible
We are a small pre-funding startup, so we are looking for someone technically strong but realistic on budget and scope
Please include in your response:
Relevant experience with video/audio de-identification
Tools or libraries you would likely use
Which parts you would expect to handle with open-source tools
Whether you have worked with AWS S3
How you would approach processing 50 hours of video/audio
Any similar dataset preparation work you have done
Your estimated cost and timeline for the first paid test project
Whether you are working alone or as part of a team
Apri su Upwork