AI/ML Engineer (LegalTech) – Data Pipeline, Web Scraping & Machine Learning Development
Presupuesto: $15.0 - $30.0
HOURLY / PART_TIME
⭐ 4.95 (8)
United States
machine-learning, tensorflow, data-science, python-sklearn, artificial-intelligence, python, artificial-neural-networks, deeplearn.js, deep-learning, apache-mahout
Overview
We are a growing LegalTech company seeking an experienced AI/ML Engineer to design and build a robust data pipeline that collects, processes, and analyzes legal data to power intelligent machine learning models.
This is a hands-on role for someone who has experience in data engineering, web scraping, machine learning, and deploying production-ready AI solutions. The ideal candidate is comfortable working with large datasets, building scalable pipelines, and developing predictive models that generate meaningful business insights.
Project Scope
The successful candidate will be responsible for designing and implementing an end-to-end AI pipeline, including:
Building automated data ingestion pipelines from multiple public and private data sources.
Developing reliable web scrapers and crawlers to collect legal and business-related information from websites and online databases.
Cleaning, normalizing, and structuring large datasets for machine learning.
Designing and training machine learning models that evaluate and score key legal and business metrics.
Developing feature engineering workflows that improve model accuracy.
Creating repeatable ETL/ELT processes for ongoing data collection.
Evaluating model performance and continuously improving prediction quality.
Working closely with our team to define scoring methodologies and business logic.
Documenting the architecture and implementation for future maintenance.
Desired Outcomes
The final solution should be capable of:
Automatically collecting relevant legal and business data.
Continuously updating datasets through scheduled data collection.
Identifying patterns and relationships within legal data.
Assessing key legal and business metrics using machine learning.
Producing structured outputs that can be consumed by our LegalTech platform.
Supporting future expansion into AI-powered legal analytics and decision-support tools.
Required Skills
Python (expert level)
Machine Learning (scikit-learn, XGBoost, LightGBM, TensorFlow, or PyTorch)
Data Engineering
Web Scraping (BeautifulSoup, Scrapy, Playwright, Selenium)
API integrations
SQL and database design
Data cleaning and preprocessing
Feature engineering
Model evaluation and optimization
Git version control
Preferred Experience
Experience building production ML pipelines
Knowledge of NLP and Large Language Models (LLMs)
Experience with Retrieval-Augmented Generation (RAG)
Familiarity with legal, regulatory, or compliance datasets
Experience deploying models using cloud platforms (AWS, Azure, or Google Cloud)
Docker and containerized deployments
Workflow orchestration tools such as Airflow or Prefect
Vector databases and semantic search experience is a plus
Nice to Have
Experience working with court records, legal documents, contracts, or regulatory data
Knowledge of OCR and document parsing
Experience with embedding models and AI agents
Background in LegalTech, FinTech, GovTech, or RegTech
Deliverables
End-to-end automated data pipeline
Web scraping framework with monitoring and error handling
Clean and structured training datasets
Trained machine learning model(s)
Model evaluation reports and documentation
Deployment scripts and technical documentation
Well-documented, maintainable source code in Git
What We're Looking For
We're looking for someone who is proactive, communicates well, and enjoys solving complex data and AI challenges. You should be able to recommend the best technical approach rather than simply implementing requirements.
If you have built AI systems that combine data engineering, web scraping, and machine learning in production environments, we'd love to hear from you.
When applying, please include:
Examples of similar AI/ML projects you've completed.
Links to GitHub, portfolio, or published work (if available).
A brief explanation of how you would architect this solution.
Your experience working with large-scale data pipelines and machine learning models.
Your availability and estimated timeline.
Abrir en Upwork