Senior Computer Vision / OCR / Machine Learning Engineer

Budget: $5.0 - $40.0 HOURLY / FULL_TIME ⭐ 4.49 (175) United States

embedded-systems, microcontroller-programming, reverse-engineering

We’re looking for an exceptional Computer Vision / Machine Learning engineer to help solve difficult document understanding and OCR problems. Your first project will involve improving recognition of financial documents where traditional OCR struggles. If successful, you’ll later work on understanding complex engineering diagrams and other challenging visual reasoning tasks. This is an opportunity to work on real-world AI problems that combine computer vision, OCR, deep learning, and frontier AI models. What you’ll work on Initially you’ll help build an advanced OCR pipeline capable of handling difficult images where existing commercial solutions fail. Examples include: * Detecting regions of interest * Image normalization and deskewing * Character segmentation * OCR under heavy occlusion * Vision Transformer experimentation * Training custom models * Combining classical computer vision with modern AI models * Confidence scoring and quality validation Later projects may include: * Reading electrical wiring diagrams * Symbol detection * Diagram understanding * Graph extraction * Engineering document intelligence * Vision-language models (VLMs) ⸻ Required Skills * Excellent Python * PyTorch * Computer Vision * OCR systems * CNNs and Vision Transformers * YOLO or similar object detection models * OpenCV * Model training and fine-tuning * Dataset preparation * Deep learning debugging * Reading and implementing research papers ⸻ Nice to Have * PaddleOCR * TrOCR * Donut * Florence-2 * GroundingDINO * Segment Anything (SAM) * Hugging Face * TensorRT * ONNX * Document AI * LayoutLM * Synthetic data generation * Active learning ⸻ What We’re Looking For We’re not looking for someone who only calls APIs. We’re looking for someone who has actually built and trained vision models. You should be comfortable: * improving datasets * experimenting with architectures * reading research papers * reproducing papers * measuring accuracy * designing experiments * shipping working models ⸻ Example First Project Build a prototype that improves OCR accuracy on difficult financial documents where handwriting overlaps printed text. Possible approaches include: * Detecting printed characters separately from handwriting * Recovering partially occluded text * Training custom recognition models * Using modern Vision Language Models * Building confidence estimation * Routing uncertain cases for human review Success will be measured by recovering a meaningful percentage of documents that existing OCR systems cannot read. ⸻ Ideal Background You’ve worked on projects involving: * document OCR * invoice OCR * receipt OCR * passport OCR * check processing * handwriting recognition * industrial vision * document AI * medical imaging * engineering diagrams ⸻ To Apply Please answer ALL of the following: 1. Describe the hardest Computer Vision project you’ve built. 2. Have you trained your own models from scratch? Which ones? 3. Which OCR systems have you worked with? 4. Which Vision Transformers have you used? 5. Have you fine-tuned YOLO? Which version? 6. Have you reproduced any research papers? Which ones? 7. What is your experience with PyTorch? 8. Have you built production ML pipelines? 9. Which recent computer vision papers excited you most? 10. Include links to your GitHub, publications, Kaggle, or portfolio. Also include the word “MICR” at the top of your proposal so we know you read the full description.

Openen op Upwork