← Trabalhos

Python Developer — Build Automated Web Scraper + HTML Report Generator

Orçamento: $20.0 - $30.0 HOURLY / AS_NEEDED ⭐ 4.67 (9) United States

python, data-scraping, automation, html

Monthly data pipeline: scrape Patreon comments → categorize → output HTML tracker I run a music reaction YouTube channel (200K+ subscribers) with an active Patreon community. Each month, members leave comments requesting specific artists and songs. I need a recurring pipeline that scrapes those comments, parses them into structured data, and outputs a clean HTML page I can paste directly into Squarespace. WHAT THE JOB INVOLVES: Scrape — pull comment data from a Patreon post each month (existing Python scraper code provided) Parse & categorize — extract artist name, song title, and request category from free-text comments (see the hard part below) Generate HTML — output a formatted tracker page ready to paste into Squarespace (existing generator code provided) Maintain — run the pipeline monthly, handle edge cases, and refine parsing accuracy over time The hard part — please read before applying: The core challenge is parsing messy, unstructured free-text into clean artist/song/category data. Patreon members write requests in every format imaginable — no consistent structure, lots of typos, ambiguous phrasing, mixed languages, and partial information. Examples of what you'll encounter: "please do hozier!! take me to church or work song either one" "Sabrina carpenter esp. please please please 🙏" "not sure if you've done this but bohemian rhapsody??" "lana rey dark paradise or young and beautiful" "anything by the weekend" You'll need a strategy - whether regex + lookup tables, an LLM-assisted parsing step, fuzzy matching against a song database, or something else. There's no perfect solution, and some ambiguity will require judgment calls. If structured NLP or messy data parsing isn't your thing, this project will be frustrating. Please self-select accordingly. What's provided Existing Python scraper code Existing HTML generator code Sample comment dataset to test against Clear output spec (Squarespace HTML embed) Ideal background Python (intermediate+) NLP or fuzzy text parsing experience Comfort with ambiguous, imperfect data
Abrir na Upwork