Data Scientist / ML Engineer — Sports Data Pipelines, Snowflake & Feature Engineering
Bütçe: $20.0 - $65.0
HOURLY / FULL_TIME
⭐ 0.00 (0)
CAN
data-source-integration, apache-airflow-platform, database-architecture, python, data-science, machine-learning, deep-learning, etl-pipelines
We are looking for a strong Data Scientist, ML Engineer, or Data Systems Engineer to help with an applied sports data modeling and backtesting project.
The primary focus is NBA live-game data, but this is not a sports analyst role. We are looking for someone who is excellent with data acquisition, data pipelines, Snowflake, Python, SQL, feature engineering, machine learning workflows, and model-ready research datasets.
We have a structured data environment combining historical pricing data, NBA play-by-play data, and game-context data. The goal is to improve the flow of raw data into clean, usable research tables, build high-quality features, support backtesting, and help evaluate which variables contain predictive value.
This role maps directly to the need for stronger data reliability, research workflow, execution monitoring, and production-grade systems described in the Kymelion roadmap.
Sports data experience is helpful, especially NBA or European football / soccer, but the most important requirement is being a strong technical data scientist who can work with messy real-world data and turn it into reliable modeling inputs.
What You’ll Work On
Acquire, clean, and structure new data sources
Improve data flow into Snowflake and Python research workflows
Write SQL queries and build reusable research datasets
Work with play-by-play, game-context, pricing, and market-style data
Build model-ready features from messy raw data
Create player, lineup, rotation, fatigue, foul trouble, injury, rest, and game-state features
Build Python notebooks for exploratory analysis and modeling
Support historical simulations and backtests
Help evaluate calibration, log loss, Brier score, ROI, drawdown, or other relevant metrics
Improve repeatability of research workflows
Summarize findings clearly and practically
Required Skills
Python
SQL
Snowflake or similar cloud data warehouses
Data acquisition / ingestion
API data ingestion
Data cleaning and transformation
ETL / ELT workflows
Pandas, NumPy, SciPy, scikit-learn
Feature engineering
Machine learning workflows
Predictive modeling
Model validation and backtesting
Working with messy real-world datasets
Nice to Have
Experience building repeatable research datasets
Experience with live or near-real-time data workflows
NBA play-by-play or possession-level data
European football / soccer match-event data
Odds, pricing, or market-style data
Time-series data
XGBoost, PyTorch, TensorFlow, or similar tools
Airflow, dbt, Dagster, Prefect, or similar tools
Experience moving raw data into model-ready features
Experience with trading, forecasting, or market-style research workflows
Deliverables
Initial deliverables may include:
Clean data acquisition / ingestion workflows
Snowflake queries and structured research tables
Python notebooks
Feature engineering work
ML-ready datasets
Backtesting support
Signal validation analysis
Written summaries of findings
Recommendations for improving data quality, workflow reliability, and research speed
Project Structure
We are open to starting with a smaller paid project or trial assignment, then expanding if there is a strong fit.
This could become an ongoing part-time or contract-to-hire role.
To Apply
Please include:
A short overview of your data science / ML / data engineering background
Relevant work in data acquisition, data pipelines, machine learning, forecasting, or backtesting
Links to GitHub, Kaggle, notebooks, dashboards, models, pipelines, or prior projects
Any experience with Python, SQL, Snowflake, APIs, time-series data, or messy real-world datasets
A brief note on how you would approach turning raw live-game data into useful modeling features
Upwork'te aç