← Zakázky

Python Developer for Patent XML/PDF Data Ingestion Pipeline - Fixed Scope Trial

Rozpočet: $500.0 FIXED / ⭐ 5.00 (2) IND

api-integration, python, xml, etl-pipelines, postgresql

I need a Python data engineer to build a fixed-scope ingestion/parser milestone for patent data. This is not a full platform rebuild. The goal is to take provided sample patent data files and implement a clean, testable ingestion pipeline that can parse XML/PDF-source metadata into structured outputs. Sample is provided. Initial milestone scope: 1. Inspect provided Chinese patent XML sample package structure. 2. Build a Python parser for the sample XML files. 3. Extract key patent fields including: - publication/application identifiers - claims - claim numbers - independent/dependent claim indicators where available - description sections - bibliographic metadata - legal/current-owner metadata where available 4. Output parsed data into clean structured tables or files suitable for PostgreSQL loading. 5. Provide clear source-to-target field mapping. 6. Add basic tests using the provided sample files. 7. Provide runnable setup instructions. Possible follow-on work may include: - Korean PDF description extraction - Japanese bulk XML full-text parsing - PostgreSQL integration - translation pipeline integration Important: - This first milestone does not include dashboard work. - This first milestone does not include production deployment. - This first milestone does not include legal translation. - Please do not estimate a large open-ended rebuild. I am looking for a practical fixed-scope parser/data pipeline milestone. Ideal freelancer: - Strong Python experience - Comfortable with XML parsing - Comfortable with messy real-world data files - Experience with ETL/data pipelines - PostgreSQL experience is helpful - Patent data experience is a plus but not required Please include in your proposal: 1. Similar XML/ETL parsing work you have done. 2. How you would structure the parser. 3. What you would deliver for the fixed-price milestone. 4. Confirmation that you understand this is a bounded trial milestone, not a full platform rebuild.
Otevřít na Upwork