Computer Vision Engineer Needed for Surgical Video Analysis Benchmark Development
Budget: $250.0
FIXED /
⭐ 4.99 (62)
South Korea
pytorch, python, computer-vision, machine-learning, tensorflow, opencv, artificial-intelligence, deep-learning, cuda
Job Description
We are looking for a Computer Vision Engineer to benchmark existing state-of-the-art models on a multi-camera surgical video dataset.
Dataset
3 full surgeries
5 synchronized RGB camera views per surgery
~9 hours total video
Event and role annotations available
No depth data
Tasks
1. Role Detection
Evaluate role recognition models (e.g., MM-OR).
Deliverables:
Run inference using pretrained models
Fine-tune models on our dataset
Compare inference vs. fine-tuned performance
Report evaluation metrics and implementation details
2. Event Detection
Events include:
Instrument handoff (successful / failed / none)
Door opening / closing
Entry / exit events
Candidate models:
MMAction2 AVA
SlowFast
VideoMAE
VideoMamba
Other suitable repositories
Deliverables:
Run inference using pretrained models
Fine-tune models on our dataset
Compare inference vs. fine-tuned performance
Report event detection accuracy
3. Workflow Recognition
Workflow phases include:
Setup / Anesthesia / Wheels In
Draping / Surgery / Drapes Down
Wheels Out / Turnover
Candidate models:
I3D
TimeSformer
Video Swin Transformer
Other suitable repositories
Deliverables:
Run inference using pretrained models
Fine-tune models on our dataset
Compare inference vs. fine-tuned performance
Report workflow recognition accuracy
Requirements
Strong PyTorch experience
Experience reproducing and adapting CVPR/ICCV/ECCV video understanding models
Experience with action detection, video classification, or workflow recognition
Familiarity with MMAction2, SlowFast, VideoMAE, VideoMamba, or similar frameworks
Please include:
Relevant projects or repositories
Estimated timeline
Estimated budget
Ouvrir sur Upwork