← Jobb

Build Automated Insurance PDF Scraper (MVP Project)

Budget: $200.0 FIXED / ⭐ 4.99 (262) United States

data-scraping, javascript, python, automation

I'm looking for a Python developer with web scraping experience to build an automated system that discovers and downloads publicly available insurance documents. This is an MVP project. The goal is to build a working collection pipeline that can later be expanded into a much larger dataset. Documents of interest include: - Insurance applications - Underwriting guides - Carrier forms - Endorsements - Product brochures - Coverage summaries - Agent resources - State filing documents - Policy forms - ACORD-related documents The system should: - Crawl public websites - Find downloadable documents - Download PDFs and other common document types - Save source URLs - Extract basic metadata - Organize files into a structured folder or database Important: - Remove only exact duplicate files. - Keep similar documents, different versions, different carriers, and state-specific variations. - The purpose is to build a large AI training dataset, so document variations are valuable. Preferred Tech: - Python - Playwright, Scrapy, Crawl4AI, or similar tools - PostgreSQL or simple structured storage Deliverables: - Working scraper - Source discovery process - Documentation for running the scraper - Initial collection of at least 2,000-5,000 documents from public sources When applying, please include: 1. Similar scraping projects you've completed 2. Tools/frameworks you would use 3. Estimated timeline 4. Fixed-price quote This project may lead to a larger engagement if the initial system performs well.
Öppna på Upwork