AI/ML Engineer Needed for OCR and Document Parsing Pipeline
Buget: $200.0
FIXED /
⭐ 4.53 (11)
Ukraine
artificial-intelligence, machine-learning, python, computer-vision, data-science
We are looking for an experienced AI/ML engineer to design and implement an OCR and document parsing pipeline that can reliably extract structured data from scanned documents and PDFs.
The goal is to automate data extraction from semi-structured and unstructured documents (invoices, forms, reports, etc.) into a clean, machine-readable format that can feed our backend systems.
Project Scope
Build an end-to-end OCR and document parsing pipeline (from file upload to structured JSON output or database insertion).
Handle a variety of document layouts (scanned PDFs, images, multi-page documents, rotated/cropped pages, noisy scans).
Support at least 100 document types initially (e.g., invoices, receipts, ID documents, medical reports), with a design that makes adding new templates straightforward.
Responsibilities
- Analyze our sample documents and define the optimal OCR and parsing strategy (rules-based, ML-based, or hybrid).
- Implement OCR using tools such as Tesseract, EasyOCR, Google Vision, AWS Textract, or similar.
- Design and train ML models for layout analysis and field extraction (e.g., key-value pairs, tables, named entities) where needed.
- Build document preprocessing steps (deskewing, denoising, contrast/thresholding, language selection).
- Implement robust parsing logic to map extracted text to structured fields (JSON schema or database model).
- Set up evaluation metrics (precision/recall/accuracy per field) and iterate to reach agreed quality thresholds.
- Package the solution as an API or microservice that our backend can call (REST/GraphQL, etc.).
Required Skills and Experience
- Strong experience with Python for AI/ML and backend scripting.
- Hands-on experience with OCR and document AI (e.g., Tesseract, EasyOCR, PaddleOCR, Google Vision, AWS Textract, Azure Form Recognizer).
- Solid understanding of computer vision and NLP techniques for document understanding (layout analysis, entity extraction, text normalization).
- Experience with one or more ML/DL frameworks: PyTorch, TensorFlow, or similar.
- Familiarity with data preprocessing, feature engineering, and model evaluation best practices.
- Experience exposing models as APIs or integrating ML pipelines into production systems (Docker, FastAPI, Flask, etc.).
- Strong communication skills and ability to document code and decisions clearly.
Nice-to-Have
Experience with document layout models (LayoutLM, Donut, TrOCR, or similar).
Prior work on invoice/receipt parsing, ID document extraction, or healthcare document workflows.
Knowledge of MLOps practices (model versioning, logging, monitoring, CI/CD for ML).
Experience with major cloud platforms (AWS/GCP/Azure) and their document AI services.
Deliverables
- A working OCR and document parsing pipeline integrated with our backend (or exposed as a well-documented API).
- Clean, well-structured code repository (Git) with clear setup instructions.
- Configuration to support at least 100 initial document types, with documentation on how to add more.
- Evaluation report (test set, metrics per document type, known limitations and edge cases).
- Optional: Basic dashboard or logs to monitor parsing performance and errors over time.
How to Apply
Please start your proposal with the word "OCR-PIPELINE" so we know you read the post.
Include the following:
A brief summary of your experience with OCR and document parsing (1-2 paragraphs).
Links or descriptions of similar projects you've completed (especially OCR / document AI work).
Your preferred tech stack for this project (OCR engine, ML framework, API framework, cloud provider if any).
A rough outline of your approach to building a robust OCR + parsing pipeline for mixed document types.
Your estimated timeline and total cost or hourly rate
Looking forward to your proposals.
Deschide pe Upwork