Python Developer for Patent XML/PDF Data Ingestion Pipeline - Fixed Scope Trial
Költségvetés: $500.0
FIXED /
⭐ 5.00 (2)
IND
api-integration, python, xml, etl-pipelines, postgresql
I need a Python data engineer to build a fixed-scope ingestion/parser milestone for patent data.
This is not a full platform rebuild. The goal is to take provided sample patent data files and implement a clean, testable ingestion pipeline that can parse XML/PDF-source metadata into structured outputs. Sample is provided.
Initial milestone scope:
1. Inspect provided Chinese patent XML sample package structure.
2. Build a Python parser for the sample XML files.
3. Extract key patent fields including:
- publication/application identifiers
- claims
- claim numbers
- independent/dependent claim indicators where available
- description sections
- bibliographic metadata
- legal/current-owner metadata where available
4. Output parsed data into clean structured tables or files suitable for PostgreSQL loading.
5. Provide clear source-to-target field mapping.
6. Add basic tests using the provided sample files.
7. Provide runnable setup instructions.
Possible follow-on work may include:
- Korean PDF description extraction
- Japanese bulk XML full-text parsing
- PostgreSQL integration
- translation pipeline integration
Important:
- This first milestone does not include dashboard work.
- This first milestone does not include production deployment.
- This first milestone does not include legal translation.
- Please do not estimate a large open-ended rebuild. I am looking for a practical fixed-scope parser/data pipeline milestone.
Ideal freelancer:
- Strong Python experience
- Comfortable with XML parsing
- Comfortable with messy real-world data files
- Experience with ETL/data pipelines
- PostgreSQL experience is helpful
- Patent data experience is a plus but not required
Please include in your proposal:
1. Similar XML/ETL parsing work you have done.
2. How you would structure the parser.
3. What you would deliver for the fixed-price milestone.
4. Confirmation that you understand this is a bounded trial milestone, not a full platform rebuild.
Megnyitás Upworkön