Build Automated Insurance PDF Scraper (MVP Project)
Budget: $200.0
FIXED /
⭐ 4.99 (262)
United States
data-scraping, javascript, python, automation
I'm looking for a Python developer with web scraping experience to build an automated system that discovers and downloads publicly available insurance documents.
This is an MVP project. The goal is to build a working collection pipeline that can later be expanded into a much larger dataset.
Documents of interest include:
- Insurance applications
- Underwriting guides
- Carrier forms
- Endorsements
- Product brochures
- Coverage summaries
- Agent resources
- State filing documents
- Policy forms
- ACORD-related documents
The system should:
- Crawl public websites
- Find downloadable documents
- Download PDFs and other common document types
- Save source URLs
- Extract basic metadata
- Organize files into a structured folder or database
Important:
- Remove only exact duplicate files.
- Keep similar documents, different versions, different carriers, and state-specific variations.
- The purpose is to build a large AI training dataset, so document variations are valuable.
Preferred Tech:
- Python
- Playwright, Scrapy, Crawl4AI, or similar tools
- PostgreSQL or simple structured storage
Deliverables:
- Working scraper
- Source discovery process
- Documentation for running the scraper
- Initial collection of at least 2,000-5,000 documents from public sources
When applying, please include:
1. Similar scraping projects you've completed
2. Tools/frameworks you would use
3. Estimated timeline
4. Fixed-price quote
This project may lead to a larger engagement if the initial system performs well.
Apri su Upwork