Senior Data Scientist
Budget: $15.0 - $40.0
HOURLY / FULL_TIME
⭐ 5.00 (16)
Germany
data-science, python, machine-learning, data-analysis
You will own the data enrichment strategy for a massive archive of world-class journalism. Your mission is to take 25 years of historical content and "hydrate" it—cleaning and tagging it with metadata so it can power next-gen AI products and search tools. You’ll act as a bridge between business leaders and engineering teams, turning complex editorial goals into smart, scalable data pipelines.
Most Important
Senior NLP & ML Experience: 5+ years of experience processing large-scale, unstructured text datasets.
Technical Stack: Advanced proficiency in Python (Pandas, PySpark) and building production-ready ETL pipelines.
NLP Frameworks: Hands-on experience with spaCy, Hugging Face, or Transformers for entity recognition and categorization.
Search Knowledge: Familiarity with OpenSearch or Elasticsearch, specifically regarding vector embeddings and index mapping.
Taxonomy Design: Ability to design metadata structures that capture the value of diverse content.
Strategy & Consultation: Experience leading technical discovery sessions and translating business needs into technical requirements.
Nice to Have
Legacy Data Handling: Experience working with messy, historical HTML and "dirty" data archives.
Efficiency Focus: Knowledge of using open-source LLMs to process data in a cost-effective way.
Modern Search: Exposure to hybrid search (Lexicon + Vector) and graph-based retrieval.
Personal Traits
The Translator: You can explain complex AI concepts to non-technical people without losing them.
The Diplomat: You are great at mediating between different teams with competing priorities.
Pragmatic Thinker: You focus on results and ROI, knowing when a "good enough" model is better than a perfect one that’s too expensive.
Curious Investigator: You enjoy digging into decades of data to find patterns and solve "messy" problems.
Team Player: You enjoy working closely with backend and search engineers to ensure your data actually works in the final product.
Ouvrir sur Upwork