← Oferty

Sports Data Collection & Game State Validation (Kalshi / MLB APIs)

Budżet: $1500.0 FIXED / ⭐ 5.00 (1) United States

api-integration, data-extraction, etl, sql, pandas, python, data-mining, data-processing, data-analysis

Project Overview A freelancer is being sought to build a historical sports dataset by collecting, validating, and organizing data from Kalshi and MLB-related sources. This is a fixed-price project with detailed specifications, clearly defined deliverables, and structured milestone payments. The total scope covers MLB games for seasons 2021 to 2026 season-to-date, with an optional expansion milestone covering seasons 2016 to 2020. Additional work may be available upon successful completion of this project. What You Will Be Doing You will collect historical data from Kalshi and MLB-related sources and produce a historical dataset according to a predefined 16-field data dictionary. While some fields are straightforward, several require accurately matching historical probability timestamps to the corresponding MLB game state (score, inning, etc.) and validating that the resulting data is correct. The project involves historical data extraction, game-state reconstruction and validation, quality control, investigation of data discrepancies, and documentation of any assumptions or exceptions. The final deliverable is a research-quality CSV dataset suitable for historical analysis. A detailed project specification and data dictionary have already been prepared and will be shared with shortlisted candidates during the interview process. Project Deliverables & Milestones Phase 1 – Proof of Concept (200 Games) — 25% Payment Populate all required fields defined in the project specification for the 200 most recent completed 2026 MLB games. Any data quality issues, assumptions, exceptions, or limitations discovered during this phase should be documented. Corrections and refinements identified during client review must be incorporated before approval. Deliverable: Complete and validated CSV dataset for the 200 most recent completed 2026 MLB games, including documentation for any flagged records. Phase 2 – Recent Historical Dataset (2024–2026 Season-To-Date) — 35% Payment Populate all required fields defined in the project specification for all MLB games from the 2024 season, 2025 season, and 2026 season-to-date. The 2024 and 2025 seasons each contain approximately 2,430 games. As the 2026 season is in progress, the number of 2026 games included will depend on the date of delivery. All refinements, corrections, and methodology adjustments identified during the POC phase must be incorporated into this dataset. Deliverable: Complete and validated CSV dataset covering 2024, 2025, and 2026 season-to-date MLB games. Phase 3 – Remaining Historical Dataset & Final Delivery (2021–2023) — 40% Payment Populate all required fields defined in the project specification for all MLB games from the 2021, 2022, and 2023 seasons (approximately 7,290 total games based on 2,430 games per season). Perform final validation checks, cross-year consistency review, and consolidate all project data into a unified dataset. Deliverable: Complete and validated CSV dataset covering 2021–2026 season-to-date MLB games, including final documentation and consolidated project deliverables. Phase 4 – Optional Expansion Milestone (2016–2020) — $750 Fixed Price This optional milestone may be awarded at the client's discretion following successful completion of Phases 1–3. The same specification, methodology, and validation standards will be applied to the 2016–2020 MLB seasons (approximately 12,150 total games based on 2,430 games per season). Deliverable: Complete and validated CSV dataset covering 2016–2020 MLB games. Required Experience • Experience working with APIs and structured datasets. • Experience collecting, transforming, and validating historical data. • Experience working with large historical datasets and maintaining data accuracy across multiple sources. • Strong attention to detail and ability to identify, document, and resolve data inconsistencies, missing information, or other data quality issues. • Ability to clearly document assumptions, exceptions, and data-quality findings. Nice to Have • Experience working with sports data APIs, play-by-play data, or sports analytics projects. • Experience working with prediction-market APIs and market data sources such as Kalshi or Polymarket. • Python, SQL, or other data-processing experience. • Familiarity with MLB game structure, scoring, and terminology. What to Expect Significant effort has been invested in defining the scope, deliverables, milestones, and data requirements so the freelancer can focus on execution rather than guessing project requirements. Questions are encouraged, assumptions can be discussed, and constructive feedback will be provided throughout the project. Responses to questions will be prompt, clear, and data driven. The POC phase is intended to validate methodology early so that any refinements can be incorporated before the larger historical dataset is built. Accuracy and validation are prioritized over speed of completion. The objective is to establish a repeatable methodology during the POC phase that can be applied consistently throughout the larger dataset. How to Apply Please apply by providing the following: • A summary of any experience working with APIs, historical datasets, sports data, or data-validation projects. • One or two examples of projects you completed that required collecting, validating, or reconciling data from multiple sources. • A response to the following question: A key requirement of this project is accurately matching historical probability observations to the corresponding MLB game state. How would you validate that a historical probability timestamp has been correctly matched to the corresponding MLB game state (score, inning, etc.)? Feel free to discuss any data sources, validation methods, or quality-control procedures you would use. • Your estimated availability and earliest possible start date. Thoughtful responses are preferred over generic submissions. Candidates selected for interviews will be provided with additional project details and specifications.
Otwórz na Upwork