AI / Python Developer Needed to Build Environmental Due Diligence SaaS MVP RAG, OCR, & Web Scraping
Budget: $3000.0
FIXED /
⭐ 0.00 (0)
United States
python, data-scraping, artificial-intelligence
Job Title:
AI / Python Developer Needed to Build Environmental Due Diligence SaaS MVP (RAG, OCR, & Web Scraping)
Job Description:
We are looking for a talented AI/Full-Stack Developer (or Small Agency) to build a Minimum Viable Product (MVP) for a business-to-business (B2B) SaaS platform. The application automates environmental due diligence and Phase I ESA reporting.
The core function of the app is to take a property address, scrape state/federal regulatory databases, use AI to scan dense/scanned PDF reports for environmental anomalies (RECs), and automatically draft FOIA requests and summary reports.
Core Responsibilities & Features to Build:
1. Data Ingestion (Scraping): Build scripts to automatically search and download public environmental records from target federal/state registries based on an address input.
2. Document Processing (OCR & RAG): Implement an advanced OCR pipeline (e.g., AWS Textract or Azure Document Intelligence) to convert messy, historical scanned government PDFs into searchable text.
3. AI Analysis Engine: Build a Retrieval-Augmented Generation (RAG) pipeline using an LLM (OpenAI GPT-4o or Anthropic Claude) and a Vector Database (Pinecone, Chroma, or similar) to accurately search records for specific environmental hazards (spills, tanks, leaks) and cite sources precisely.
4. Automation Output: Build a feature that auto-generates localized FOIA request letters and structured summary reports based on the findings.
5. Frontend/UI: A simple, clean dashboard where users can input addresses, track ongoing FOIA requests, and download reports. Open to building the frontend in a robust No-Code tool like Bubble.io if it seamlessly connects to the Python backend APIs.
Required Technical Skills:
* Language: Python (Strong expertise)
* AI Frameworks: LangChain, LlamaIndex, or similar LLM orchestration tools
* Vector Databases: Pinecone, Weaviate, Milvus, or Chroma
* APIs: OpenAI API, Anthropic API
* OCR Tools: AWS Textract, Azure Document Intelligence, or Google Cloud Document AI
* Web Scraping: BeautifulSoup, Scrapy, or Selenium
* Database & Frontend: PostgreSQL, React, or advanced Bubble.io API integration
How to Apply:
Please submit a brief proposal detailing:
1. Your experience building RAG (Retrieval-Augmented Generation) applications that analyze large, messy PDFs.
2. An example of a project where you successfully combined web scraping with an AI pipeline.
3. Your estimated timeline and budget approach for a project of this scope (Milestone-based or Hourly).
4. Start your proposal with the words "EcoAI" so I know you read this post
Ouvrir sur Upwork