Develop a Python Tool for Large-Scale Public PDF Data Extraction | HIRING ASAP | PDF Documents

Budget: $20.0 FIXED / ⭐ 4.84 (7) United States

data-scraping, python, data-mining

I’m looking for an experienced Python developer to build a robust, well-structured script/tool for automated data extraction from a public government portal. Project Overview: The target is a public health inspection website that hosts thousands of PDF reports. The goal is to create a reliable tool that can systematically collect and organize large numbers of these public PDF files. Key Requirements: Build a Python-based solution (Playwright, Selenium, or Scrapy preferred) capable of handling large volumes of data. The tool must extract and download PDF files while properly organizing them into a clean folder structure based on entity/restaurant names. Must include proper rate limiting, respectful request handling, and the ability to resume interrupted sessions. The script should be clean, well-documented, and follow best practices for ethical web data extraction. Deliver the complete script along with setup instructions and documentation so I can run and maintain it. Important: This project requires a responsible, professional approach. The tool must be designed to interact with the target site in a respectful manner (appropriate delays, error handling, etc.). I am only interested in working with developers who understand how to build reliable and considerate data extraction tools. Please include in your proposal: Your experience building similar large-scale or PDF-heavy extraction tools This is a script/tool development project only.

Ouvrir sur Upwork