AI/ML Engineer (LegalTech) – Data Pipeline, Web Scraping & Machine Learning Development

Budget: $15.0 - $30.0 HOURLY / PART_TIME ⭐ 4.95 (8) United States

machine-learning, tensorflow, data-science, python-sklearn, artificial-intelligence, python, artificial-neural-networks, deeplearn.js, deep-learning, apache-mahout

Overview We are a growing LegalTech company seeking an experienced AI/ML Engineer to design and build a robust data pipeline that collects, processes, and analyzes legal data to power intelligent machine learning models. This is a hands-on role for someone who has experience in data engineering, web scraping, machine learning, and deploying production-ready AI solutions. The ideal candidate is comfortable working with large datasets, building scalable pipelines, and developing predictive models that generate meaningful business insights. Project Scope The successful candidate will be responsible for designing and implementing an end-to-end AI pipeline, including: Building automated data ingestion pipelines from multiple public and private data sources. Developing reliable web scrapers and crawlers to collect legal and business-related information from websites and online databases. Cleaning, normalizing, and structuring large datasets for machine learning. Designing and training machine learning models that evaluate and score key legal and business metrics. Developing feature engineering workflows that improve model accuracy. Creating repeatable ETL/ELT processes for ongoing data collection. Evaluating model performance and continuously improving prediction quality. Working closely with our team to define scoring methodologies and business logic. Documenting the architecture and implementation for future maintenance. Desired Outcomes The final solution should be capable of: Automatically collecting relevant legal and business data. Continuously updating datasets through scheduled data collection. Identifying patterns and relationships within legal data. Assessing key legal and business metrics using machine learning. Producing structured outputs that can be consumed by our LegalTech platform. Supporting future expansion into AI-powered legal analytics and decision-support tools. Required Skills Python (expert level) Machine Learning (scikit-learn, XGBoost, LightGBM, TensorFlow, or PyTorch) Data Engineering Web Scraping (BeautifulSoup, Scrapy, Playwright, Selenium) API integrations SQL and database design Data cleaning and preprocessing Feature engineering Model evaluation and optimization Git version control Preferred Experience Experience building production ML pipelines Knowledge of NLP and Large Language Models (LLMs) Experience with Retrieval-Augmented Generation (RAG) Familiarity with legal, regulatory, or compliance datasets Experience deploying models using cloud platforms (AWS, Azure, or Google Cloud) Docker and containerized deployments Workflow orchestration tools such as Airflow or Prefect Vector databases and semantic search experience is a plus Nice to Have Experience working with court records, legal documents, contracts, or regulatory data Knowledge of OCR and document parsing Experience with embedding models and AI agents Background in LegalTech, FinTech, GovTech, or RegTech Deliverables End-to-end automated data pipeline Web scraping framework with monitoring and error handling Clean and structured training datasets Trained machine learning model(s) Model evaluation reports and documentation Deployment scripts and technical documentation Well-documented, maintainable source code in Git What We're Looking For We're looking for someone who is proactive, communicates well, and enjoys solving complex data and AI challenges. You should be able to recommend the best technical approach rather than simply implementing requirements. If you have built AI systems that combine data engineering, web scraping, and machine learning in production environments, we'd love to hear from you. When applying, please include: Examples of similar AI/ML projects you've completed. Links to GitHub, portfolio, or published work (if available). A brief explanation of how you would architect this solution. Your experience working with large-scale data pipelines and machine learning models. Your availability and estimated timeline.

Auf Upwork öffnen