Data Engineer / Consultant (Hands-On)
Rozpočet: -
HOURLY / FULL_TIME
⭐ 5.00 (142)
United States
bigquery, database-architecture, etl-pipelines, big-data, data-science, data-management, python, sql, data-analysis, data-modeling
We're a fast-growing consumer electronics brand (connected air purifiers, expanding into the broader home environment) with a mature, senior-grade data platform already in production. We need a hands-on data engineer who can do three things: keep it running, build new flows on top of it, and help us actually turn the data into business decisions and revenue.
The platform is real and well-built — the gap is ownership and the "last mile." We have built a pipeline that runs largely headless today. We want someone who will take the wheel, keep it healthy, extend it as we add sources, and — critically — close the loop so the business consumes and acts on what it produces.
The Stack (already in production)
Sources (11+): Amazon SP-API, Shopify GraphQL, Walmart Marketplace, Flowspace 3PL, Tuya IoT (device telemetry — filter life, AQI, runtime), Klaviyo, Yotpo, Recharge, plus reference/environmental feeds (currency, AirNow AQI, NASA FIRMS wildfire data).
Ingestion: ~12 custom Python extractors (one is built on dlt), shared auth/util libraries.
Orchestration: Prefect Cloud + Google Cloud Run, scheduled flows (hourly / 6-hour / daily / monthly), freshness gates, Slack alerting.
Warehouse: BigQuery — raw → staging → clean → analytics, modeled in dbt (~30 staging models, clean "business logic" tables, analytics tables, 15 custom tests, macros for identity normalization).
Outbound / reverse ETL: Tuya→Klaviyo device-and-filter sync (drives recurring filter-resale revenue), Airtable inventory sync, monthly Avalara tax export, a FastAPI internal dashboard.
Infra: Cloud Run, Artifact Registry, Secret Manager, Docker.
You don't have to learn all of this on day one — but you should be immediately comfortable in BigQuery, dbt, Python, and a modern orchestrator.
What You'll Do — Three Core Jobs
1. Keep the lights on (KTLO).
Own the day-to-day health of the pipeline. Keep the existing flows running, fix extractors when source APIs change, maintain the dbt models and tests, watch freshness/alerting, and ingest new data streams as we add channels, products, and tools. This is the foundation — it has to be boringly reliable.
2. Build new flows — activate the data.
Stand up new pipelines that do something with the data, not just store it. Reverse-ETL audiences and signals back into the tools the business uses (Klaviyo, ad platforms, internal tools), build identity-resolution logic across channels (e.g., resolving anonymous Amazon orders to known customers), and wire up the data behind product features (notifications, device-synced subscriptions, environmental alerts). You'll turn a warehouse of facts into operational triggers.
3. Monetize the data — help us decide and earn with it.
This is what makes this role more than maintenance. Our platform was built ahead of its consumers: the engineering is strong, but the "business front door" (dashboards, clean metrics, decision-ready outputs) was never finished. You'll help close that gap — define and validate the metrics that matter (true cross-channel revenue, filter-replacement/retention cohorts, LTV, inventory/reorder signals), build the consumption layer (dashboards / Slack numbers / spreadsheets people will actually open), and partner with us on the analyses that drive real decisions: which customers to target, what to reorder, where revenue is leaking, and how to grow subscription/filter-resale revenue.
We're explicit that we want a builder who also advises — someone with a point of view on what's worth building, what's over-built, and where the leverage is. You'll work directly with our PM/ops/marketing leads, not in a silo.
What We're Looking For
Must-haves;
- Strong SQL + dbt. You've built and maintained a modeled warehouse (staging → marts), written tests, and reasoned about business definitions encoded in models.
- BigQuery (or comparable cloud warehouse) in production.
- Python for data extraction/transformation — building and debugging API extractors, handling auth, rate limits, pagination, incremental loads.
- Orchestration experience (Prefect, Airflow, Dagster, or similar) and comfort deploying on cloud infra (GCP / Cloud Run a plus).
- Reverse ETL / activation — you've pushed warehouse data back into operational tools (Klaviyo, CRMs, ad platforms, reverse-ETL tooling).
- A consultant's instinct. You can look at a stack, tell us what's earning its keep vs. what's dark/unused, and recommend where to invest — and explain it to non-technical stakeholders.
Strongly preferred:
- E-commerce / marketplace data experience — Amazon SP-API (RDT/PII quirks), Shopify, Walmart, 3PL/inventory data.
- IoT / device telemetry experience (Tuya or similar) — datapoints, device-to-cloud, fleet data.
- Identity resolution / customer 360 / cross-channel stitching.
- BI / dashboarding (Looker Studio, Metabase, Hex, or similar) to build the consumption layer.
- Marketing/lifecycle data fluency (Klaviyo, attribution, cohorts, LTV).
- AI-native workflow — you use Cursor / Claude Code / Copilot to move faster.
Otvoriť na Upwork