← Missions

AI / Computer Vision Engineer Needed for Video + Audio De-identification and Dataset Preparation

Budget: - HOURLY / PART_TIME ⭐ 5.00 (35) United Kingdom

machine-learning, computer-vision

We are a small early-stage startup working on a medical procedure computer vision dataset project. We are looking for an experienced AI / computer vision engineer to help process a first test batch of approximately 100 hours of video and audio material. The files are expected to be stored in AWS S3, and the initial task is to build or apply a practical pipeline to prepare these files for downstream AI / computer vision model training. The project involves video and audio from real-world technical procedures. The material may contain sensitive or identifiable information, so the first priority is robust de-identification and media preparation. We are working with a limited early-stage budget, so we would strongly prefer someone who can make intelligent use of existing open-source tools, libraries and models rather than building everything from scratch or relying on expensive proprietary services. Initial scope of work: Access and process large video/audio files from AWS S3 Detect and blur/redact visible identifying information in video frames Blur or mask faces, names, labels, documents, screens, tattoos, or other identifying visual details where present Crop or mask areas outside the relevant procedural field where appropriate Redact, mute or bleep spoken identifiers in audio, such as names, addresses, dates of birth, locations or other personal details Produce de-identified output files suitable for review and later AI model training Preserve file organisation and metadata so that each procedure/session remains clearly grouped Create a repeatable processing pipeline rather than a purely manual workflow Provide clear logs/reports of what has been processed and any files requiring human review Additional capabilities: Audio-video synchronisation where separate audio and video files are provided Handling multiple camera angles or multiple video files from the same procedure/session Stitching, trimming or aligning related media files where needed Using audio transcripts or time markers to create useful labels or annotations Creating structured metadata or labels that could support future computer vision, robotics or AI model training Experience with dataset preparation for machine learning pipelines Experience with secure data handling, cloud storage and controlled access environments Knowledge of federated learning or secure model-training environments would be useful for a later phase, but is not essential for the first task Basic analytics/reporting on processing status, file quality, duration, redaction success and review flags Likely tools / technologies: We are open to your recommendations, but would expect relevant experience with tools such as: Python OpenCV FFmpeg Whisper or other speech-to-text tools OCR/text detection in video frames Face/object/text detection and blurring Audio redaction pipelines AWS S3 or similar cloud storage workflows Open-source ML/computer vision libraries where appropriate We are not looking for a general web developer. We are specifically looking for someone with practical experience in computer vision, video processing, audio transcription/redaction and dataset preparation. Initial paid test project: The first stage is a paid proof-of-concept using around 50 hours of video/audio. The goal is to see whether you can create a workable pipeline and demonstrate that the files can be de-identified and prepared reliably. If the test is successful, there is linkely to be a larger ongoing project involving greater volumes of media and more advanced dataset preparation. Important requirements: You must be comfortable working with sensitive data under strict confidentiality You must be willing to sign an NDA if required You should be able to explain your proposed approach clearly You should be able to work pragmatically and cost-effectively You should prioritise open-source and low-cost tooling wherever possible We are a small pre-funding startup, so we are looking for someone technically strong but realistic on budget and scope Please include in your response: Relevant experience with video/audio de-identification Tools or libraries you would likely use Which parts you would expect to handle with open-source tools Whether you have worked with AWS S3 How you would approach processing 50 hours of video/audio Any similar dataset preparation work you have done Your estimated cost and timeline for the first paid test project Whether you are working alone or as part of a team
Ouvrir sur Upwork