Component Data Engineer — Datasheet & Teardown Data Extraction for Electronics LCA
Budget: $225000.0
FIXED /
⭐ 5.00 (1)
USA
python, sql
We build automated Life Cycle Assessment (LCA) tools for electronics supply chains. We need a Component Data Engineer to build the component ground-truth corpus that feeds our model training and validation. The contractor extracts and structures component-level data from electronics datasheets, public teardowns, and supplier disclosures: die size, package, process node, function class, manufacturer, and manufacturer-to-fab attribution. Our predictive models stay in-house; the contractor does the data-engineering work that feeds them.
Scope:
1. Component identification corpus. Build a curated, sourced corpus of structured component data extracted from electronics datasheets at scale: die area, package type, process node, function class, manufacturer, and sub-part identifiers. Cover the component classes that dominate our customer BOMs (ICs, passives, connectors, magnetics) with full provenance per row.
2. Teardown data extraction. Mine public teardown sources (iFixit, FCC filings, repair guides, conference talks, vendor whitepapers) for component lists, die-area measurements, packaging details, and board-level layouts on products in our customer pipeline. Reconcile teardown observations against vendor datasheets where both exist.
3. Manufacturer-to-fab attribution. Build the mapping from manufacturer + part-family to foundry / fab where production occurred. Where direct disclosure is unavailable, document the inference path and a confidence level rather than asserting unknown.
4. Feature-engineering inputs. Deliver the structured feature columns the in-house models consume (no model work in scope): die-area distributions per node, package mix, manufacturer share, function-class share, and similar. Versioned releases on a defined cadence with changelogs.
Required:
- Hands-on data engineering with messy, unstructured inputs (PDFs, scans, datasheets, scraped HTML).
- Familiarity with electronics components: ICs, passives, connectors, magnetics, and the basics of semiconductor packaging.
- Comfortable reading electronics datasheets and pulling structured fields out of them at scale.
- Python (or equivalent) for pipeline work; SQL for corpus storage and querying.
- Strong data hygiene: sourcing, traceability, units, provenance.
Nice to have:
- Semiconductor packaging knowledge (BGA, QFN, WLCSP, flip-chip, die-on-leadframe).
- Familiarity with process nodes and manufacturer-to-fab mappings (TSMC, Samsung, Intel, GlobalFoundries, SMIC, UMC).
- LLM-assisted extraction experience (vision-language models on datasheets, structured-output prompting).
- Familiarity with public teardown sources (iFixit, TechInsights, ChipRebel, FCC filings).
- Familiarity with semiconductor LCA literature.
Engagement: Fixed-scope project, approximately 9 months calendar time, starting 2026-09-01. Remote, EU time zone preferred. NDA and IP assignment under our standard contractor agreement.
Open job