Develop a Python Tool for Large-Scale Public PDF Data Extraction | HIRING ASAP | PDF Documents
Budget: $20.0
FIXED /
⭐ 4.84 (7)
United States
data-scraping, python, data-mining
I’m looking for an experienced Python developer to build a robust, well-structured script/tool for automated data extraction from a public government portal.
Project Overview:
The target is a public health inspection website that hosts thousands of PDF reports. The goal is to create a reliable tool that can systematically collect and organize large numbers of these public PDF files.
Key Requirements:
Build a Python-based solution (Playwright, Selenium, or Scrapy preferred) capable of handling large volumes of data.
The tool must extract and download PDF files while properly organizing them into a clean folder structure based on entity/restaurant names.
Must include proper rate limiting, respectful request handling, and the ability to resume interrupted sessions.
The script should be clean, well-documented, and follow best practices for ethical web data extraction.
Deliver the complete script along with setup instructions and documentation so I can run and maintain it.
Important:
This project requires a responsible, professional approach. The tool must be designed to interact with the target site in a respectful manner (appropriate delays, error handling, etc.). I am only interested in working with developers who understand how to build reliable and considerate data extraction tools.
Please include in your proposal:
Your experience building similar large-scale or PDF-heavy extraction tools
This is a script/tool development project only.
Ouvrir sur Upwork