Data Scientist / ML Engineer — Sports Data Pipelines, Snowflake & Feature Engineering

Budżet: $20.0 - $65.0 HOURLY / FULL_TIME ⭐ 0.00 (0) CAN

data-source-integration, apache-airflow-platform, database-architecture, python, data-science, machine-learning, deep-learning, etl-pipelines

We are looking for a strong Data Scientist, ML Engineer, or Data Systems Engineer to help with an applied sports data modeling and backtesting project. The primary focus is NBA live-game data, but this is not a sports analyst role. We are looking for someone who is excellent with data acquisition, data pipelines, Snowflake, Python, SQL, feature engineering, machine learning workflows, and model-ready research datasets. We have a structured data environment combining historical pricing data, NBA play-by-play data, and game-context data. The goal is to improve the flow of raw data into clean, usable research tables, build high-quality features, support backtesting, and help evaluate which variables contain predictive value. This role maps directly to the need for stronger data reliability, research workflow, execution monitoring, and production-grade systems described in the Kymelion roadmap. Sports data experience is helpful, especially NBA or European football / soccer, but the most important requirement is being a strong technical data scientist who can work with messy real-world data and turn it into reliable modeling inputs. What You’ll Work On Acquire, clean, and structure new data sources Improve data flow into Snowflake and Python research workflows Write SQL queries and build reusable research datasets Work with play-by-play, game-context, pricing, and market-style data Build model-ready features from messy raw data Create player, lineup, rotation, fatigue, foul trouble, injury, rest, and game-state features Build Python notebooks for exploratory analysis and modeling Support historical simulations and backtests Help evaluate calibration, log loss, Brier score, ROI, drawdown, or other relevant metrics Improve repeatability of research workflows Summarize findings clearly and practically Required Skills Python SQL Snowflake or similar cloud data warehouses Data acquisition / ingestion API data ingestion Data cleaning and transformation ETL / ELT workflows Pandas, NumPy, SciPy, scikit-learn Feature engineering Machine learning workflows Predictive modeling Model validation and backtesting Working with messy real-world datasets Nice to Have Experience building repeatable research datasets Experience with live or near-real-time data workflows NBA play-by-play or possession-level data European football / soccer match-event data Odds, pricing, or market-style data Time-series data XGBoost, PyTorch, TensorFlow, or similar tools Airflow, dbt, Dagster, Prefect, or similar tools Experience moving raw data into model-ready features Experience with trading, forecasting, or market-style research workflows Deliverables Initial deliverables may include: Clean data acquisition / ingestion workflows Snowflake queries and structured research tables Python notebooks Feature engineering work ML-ready datasets Backtesting support Signal validation analysis Written summaries of findings Recommendations for improving data quality, workflow reliability, and research speed Project Structure We are open to starting with a smaller paid project or trial assignment, then expanding if there is a strong fit. This could become an ongoing part-time or contract-to-hire role. To Apply Please include: A short overview of your data science / ML / data engineering background Relevant work in data acquisition, data pipelines, machine learning, forecasting, or backtesting Links to GitHub, Kaggle, notebooks, dashboards, models, pipelines, or prior projects Any experience with Python, SQL, Snowflake, APIs, time-series data, or messy real-world datasets A brief note on how you would approach turning raw live-game data into useful modeling features

Otwórz na Upwork