Automated Web Data Extraction to Daily Excel Workbooks (Python)

Бюджет: $100.0 FIXED / ⭐ 5.00 (1) USA

crawlers, python, data-scraping, data-extraction, data-mining, microsoft-excel, scrapy-framework, automation

I am looking for an experienced Python developer to build a small, reliable data-capture tool that runs on my local computer (Windows). The tool will log into a subscription web dashboard that I have valid access to, extract structured tabular data on a fixed schedule, and write it into well-organized daily Excel workbooks. This is straightforward, professional web-data automation work. Scope of Work Authenticated access: The source is a paid dashboard that requires login. I will provide my own credentials and grant access; the script must securely use a login session (e.g., via Playwright) to reach the data pages. Credentials must be stored locally in a config/.env file, never hard-coded. 2. Data extraction: For each category page, capture every record displayed. IMPORTANT: many records have additional detail hidden behind an expandable row. The script MUST click/expand each event row to reveal and capture the full set of underlying fields, not just the surface summary values. 3. Scheduled capture: Run automatically every 1 minute, throughout the day. Use a scheduler (Python schedule library, APScheduler, or Windows Task Scheduler) - your recommendation welcome. 4. Daily workbook rotation: Create one fresh Excel workbook (.xlsx) per calendar day, named by date, e.g. data_2026-06-28.xlsx. A new day creates a brand-new file automatically. 5. Workbook structure: Within each daily workbook, create one tab per category (multiple categories). Within the data, separate the three record types: Spread, O/U (Totals), and Moneyline. Each category should present these three market types (sub-tabs or clearly separated sections - your recommendation). 6. No duplicates / update-in-place: Each unique selection should appear as ONE row per day. On every 1-minute fetch, if a row already exists, UPDATE its values and refresh its capture timestamp rather than adding a duplicate. Only genuinely new selections add new rows. 7. Timestamps: Every row must carry a capture timestamp column showing the most recent update time. The workbook should reflect updates continuously through the day. Technical Requirements Language: Python 3. - Suggested libraries: Playwright or Selenium (for login + expanding rows), pandas, openpyxl. - Robust handling of page loads, expand-row waits, and intermittent network issues (retries, logging). - Clean, commented, maintainable code in a small repo with a README. - A config file for credentials, schedule interval, output folder, and category list. - Runs reliably unattended on my local Windows machine. Deliverables Working Python script/package with setup instructions (README). 2. Requirements.txt and a sample config file. 3. A short walkthrough (written or short video) showing install + run on my machine. 4. A sample output workbook demonstrating the structure and dedup/update behavior. Ideal Candidate Strong experience with browser automation (Playwright/Selenium) including handling logged-in sessions and dynamic/expandable content. - Solid pandas/openpyxl experience for structured Excel output. - Good communication and clean documentation. To Apply Please briefly describe a similar automation you have built, your suggested approach to the expandable-row capture and the no-duplicate/update-in-place logic, and your estimated timeline. Thank you.

Відкрити на Upwork