← Joburi

Full-Stack Developer Needed for OCR-Based Product Intake and Catalog Workflow

Buget: $170.0 FIXED / ⭐ 5.00 (4) FRA

python, javascript, api-integration, postgresql, web-application, ocr-algorithms, computer-vision, data-processing, automation

We are looking for an experienced full-stack developer to build a web-based inventory intake and catalog preparation module from scratch. The module should help operators turn photos of physical products into structured, reviewed, export-ready product records. The system must support image intake, OCR/AI-assisted field extraction, manual review, validation, grouping of similar items, audit history, and preparation of clean product data for downstream catalog or e-commerce publication. This is not a simple OCR demo. We need a reliable internal operations tool with a practical UI, strong validation, traceable data changes, and a backend that can later be extended with additional integrations. Core workflow 1. Photo intake - Operators should be able to upload product photos from mobile or desktop. - The interface should support camera capture on mobile devices. - The system should show a queue of incoming photos. - Queue items should have statuses such as new, opened, confirmed, rejected, archived, or deleted. - Operators should be able to filter, select, archive, reject, restore, or delete queue items. - Each image should keep metadata such as upload time, source, original filename, file type, file size, and processing status. - The system should generate safe preview/thumbnail images for the UI. 2. OCR / AI preflight - Before running extraction, the backend should check whether the image has already been processed. - The system should compute a stable file hash and use cached OCR/AI results when available. - The UI should show whether processing can run from cache or would require an external OCR/AI request. - Paid/external AI calls should be controlled by explicit settings and should not happen accidentally. - The preflight result should explain why extraction is available, blocked, or requires confirmation. 3. OCR / AI extraction - The backend should extract structured product fields from the image. - The extraction layer should support OCR and optional AI-assisted completion for missing fields. - The implementation should be modular so different OCR/AI providers can be swapped or added later. - The extraction response should include: - extracted field values; - confidence or source metadata per field; - cache status; - provider usage counters; - warnings or errors; - raw technical audit data for debugging. 4. Manual review UI - Operators must review extracted fields before saving. - The UI should show product photo preview and editable fields side by side. - Each field should display whether it came from OCR, AI, dictionary lookup, or manual entry. - Operators should be able to correct values manually. - Some fields should be required, with validation before confirmation. - If required fields are missing, confirmation should be blocked unless the operator provides an explicit override reason. - The UI should support field suggestions from a dictionary or reference dataset where available. 5. Product confirmation - After review, the system should create a confirmed product record. - The confirmed record should include: - normalized product fields; - original OCR output; - operator-reviewed fields; - field metadata; - image/media references; - validation warnings; - confirmation timestamp; - audit JSON. - Confirmed records should be stored in a database and optionally also as JSON artifacts for audit/debugging. 6. Approved products table - The module should show all confirmed product records in a searchable table. - Required table features: - search; - filters by key product attributes; - warning/clean filter; - sorting; - pagination or load more; - thumbnail preview; - full image preview; - selection of one or multiple records; - edit selected record; - delete selected record; - export to XLSX/CSV. - The table should clearly show whether each product has enough identity data to be prepared for publication. 7. Grouping and normalization - The module should group related confirmed records into publication/product groups. - Grouping should detect items that represent the same sellable product or variant. - The normalizer should produce a canonical product name and standardized attributes. - The grouped preview should show: - group name; - quantity; - member records; - missing required codes/data; - conflicts; - duplicate risks; - warnings; - blockers; - whether the group is ready or needs manual review. - The grouping/normalization rules should be deterministic and auditable. 8. Persisted normalization audit - The system should be able to save a normalization run as an audit snapshot. - A saved run should record: - run ID; - timestamp; - rule version; - number of source rows; - number of groups; - ready/blocked/manual-review counts; - group-level before/after data; - item-level before/after data; - blockers; - warnings; - rule hits; - risk flags. - The UI should allow reviewing the latest saved normalization run. - The UI should support filtering saved audit groups by all/blocked/manual/risk/ready/etc. - The audit run should be exportable as CSV. 9. Catalog/export preparation - The module should prepare clean output data for a downstream catalog or e-commerce system. - The first version may generate a preview/export instead of writing directly to the external platform. - The output should include structured product fields, options/attributes, identifiers, quantity, description fields, and media references where available. - The system should perform preflight validation before generating or writing anything: - required fields present; - identifiers present; - no blocking duplicates; - quantity valid; - no conflicting group members; - image/media policy clear; - price/publication fields either present or explicitly marked as deferred. - Any future external write should require explicit operator confirmation and should store writeback status, external product ID, timestamp, and verification result. 10. Safety and audit requirements - No accidental external writes. - No accidental paid AI calls. - Clear distinction between preview/dry-run and real write. - Every important action should be auditable. - The system should avoid trusting arbitrary file paths from the browser. - Uploaded files should be stored only in controlled server directories. - The backend should validate file type, size, path, and status transitions. - Error states should be visible to the operator and useful for debugging. Suggested technical stack We are flexible, but the developer should be comfortable with a stack similar to: - Python backend - Flask/FastAPI/Django or similar - PostgreSQL - JavaScript/TypeScript frontend - OCR / image processing - external API integration - background-safe processing patterns - CSV/XLSX export - clean admin UI design A lightweight but well-structured implementation is preferred over an over-engineered system. Expected deliverables - Working web module with authenticated/admin access support. - Mobile-friendly photo upload and review UI. - Backend API for intake, preflight, extraction, confirmation, approved products, grouping, audit runs, and export. - Database schema for intake items, confirmed products, media, audit events, normalization runs, groups, and group items. - OCR/AI integration layer with cache and cost controls. - Manual review and validation workflow. - Approved products table with filtering, editing, deletion, previews, and export. - Grouped publication preview and persisted audit view. - Documentation: - setup instructions; - environment variables; - API overview; - database schema notes; - OCR/provider configuration; - operator workflow; - deployment notes. - Basic tests for critical backend logic and validation rules. Candidate requirements Please apply only if you have experience with: - building production web applications, not only prototypes; - Python backend development; - database-backed workflows; - OCR, computer vision, or AI-assisted data extraction; - file upload handling and image processing; - admin/operator UI development; - API integrations; - data validation and audit trails; - clean, maintainable code; - practical communication and clear progress updates. Nice to have - Experience with inventory, catalog, warehouse, e-commerce, or product data workflows. - Experience with OCR providers such as Azure, Google Vision, AWS Textract, Tesseract, or similar. - Experience with AI/LLM APIs for structured extraction or field completion. - Experience with CSV/XLSX import/export. - Experience designing tools for non-technical operators. Important confidentiality note The project description here is intentionally generic. The selected freelancer will receive more specific examples and sample data after NDA/confidentiality agreement if needed. Please do not request private business data in the proposal. We can start with anonymized sample images and generic product records. How we like to work - Start with a short technical plan and architecture proposal. - Build in small milestones. - First milestone should prove the core photo upload → OCR/preflight → manual review → confirmed record flow. - Later milestones can add grouping, audit snapshots, export, and external platform preparation. - We prefer clear written updates, screenshots or short demo videos, and practical questions when requirements are ambiguous. Please include in your proposal 1. Similar projects you have built. 2. Your recommended technical stack. 3. How you would implement OCR/AI cost controls and caching. 4. How you would design the database tables for this workflow. 5. Estimated milestones and timeline. 6. Any risks or assumptions you see.
Deschide pe Upwork