Automated Web Data Extraction to Daily Excel Workbooks (Python)
Бюджет: $100.0
FIXED /
⭐ 5.00 (1)
USA
crawlers, python, data-scraping, data-extraction, data-mining, microsoft-excel, scrapy-framework, automation
I am looking for an experienced Python developer to build a small, reliable data-capture tool that runs on my local computer (Windows). The tool will log into a subscription web dashboard that I have valid access to, extract structured tabular data on a fixed schedule, and write it into well-organized daily Excel workbooks. This is straightforward, professional web-data automation work.
Scope of Work
Authenticated access: The source is a paid dashboard that requires login. I will provide my own credentials and grant access; the script must securely use a login session (e.g., via Playwright) to reach the data pages. Credentials must be stored locally in a config/.env file, never hard-coded.
2. Data extraction: For each category page, capture every record displayed. IMPORTANT: many records have additional detail hidden behind an expandable row. The script MUST click/expand each event row to reveal and capture the full set of underlying fields, not just the surface summary values.
3. Scheduled capture: Run automatically every 1 minute, throughout the day. Use a scheduler (Python schedule library, APScheduler, or Windows Task Scheduler) - your recommendation welcome.
4. Daily workbook rotation: Create one fresh Excel workbook (.xlsx) per calendar day, named by date, e.g. data_2026-06-28.xlsx. A new day creates a brand-new file automatically.
5. Workbook structure: Within each daily workbook, create one tab per category (multiple categories). Within the data, separate the three record types: Spread, O/U (Totals), and Moneyline. Each category should present these three market types (sub-tabs or clearly separated sections - your recommendation).
6. No duplicates / update-in-place: Each unique selection should appear as ONE row per day. On every 1-minute fetch, if a row already exists, UPDATE its values and refresh its capture timestamp rather than adding a duplicate. Only genuinely new selections add new rows.
7. Timestamps: Every row must carry a capture timestamp column showing the most recent update time. The workbook should reflect updates continuously through the day.
Technical Requirements
Language: Python 3.
- Suggested libraries: Playwright or Selenium (for login + expanding rows), pandas, openpyxl.
- Robust handling of page loads, expand-row waits, and intermittent network issues (retries, logging).
- Clean, commented, maintainable code in a small repo with a README.
- A config file for credentials, schedule interval, output folder, and category list.
- Runs reliably unattended on my local Windows machine.
Deliverables
Working Python script/package with setup instructions (README).
2. Requirements.txt and a sample config file.
3. A short walkthrough (written or short video) showing install + run on my machine.
4. A sample output workbook demonstrating the structure and dedup/update behavior.
Ideal Candidate
Strong experience with browser automation (Playwright/Selenium) including handling logged-in sessions and dynamic/expandable content.
- Solid pandas/openpyxl experience for structured Excel output.
- Good communication and clean documentation.
To Apply
Please briefly describe a similar automation you have built, your suggested approach to the expandable-row capture and the no-duplicate/update-in-place logic, and your estimated timeline. Thank you.
Отвори в Upwork