Full-Stack Developer / Data Extraction Engineer
Budget: $6.0 - $19.0
HOURLY / PART_TIME
⭐ 4.82 (79)
United States
data-extraction, crawlers, etl, api, python, mysql, postgresql
Title: Data Extraction & Segmentation
Project Overview
We need to extract, normalize, and structure data from a CRM environment containing approximately:
* 100,000+ company records
* 30,000-40,000 target companies for detailed extraction
* 60,000+ contact records
* Potentially 1M+ activity records
The extracted data will be used for market intelligence, lead sourcing, lead enrichment, and AI workflows.
IMPORTANT FIRST MILESTONE
Before proposing browser automation or scraping:
Developer must determine whether our CRM's APIs can be used.
Evaluate:
CRM REST API
CRM Bulk API
SOQL access
Export capabilities
Object relationships
Custom Salesforce objects
Deliverable:
Technical assessment recommending:
API approach
Bulk API approach
Browser automation approach (only if necessary)
CRM Environment
The environment contains custom healthcare data including:
Companies (Buyers)
Buyers including:
Private Equity
Strategic Acquirers
Search Funds
Family Offices
Independent Sponsors
Operators
Broker/Intermediaries
Listings
Transactions across:
Home Health
Hospice
Behavioral Health
Dental
Imaging
Other Healthcare Services
Opportunities
Buyer interest and transaction activity.
Contacts
Buyer executives and deal professionals.
Activity History
Historical buyer engagement.
Listing History
Historical transaction changes.
PRIMARY OBJECTIVE
Create a structured buyer intelligence database.
Every buyer, contact, opportunity, listing, activity, and history record should be captured and normalized.
DATA EXTRACTION REQUIREMENTS
1. Company / Buyer Accounts
Capture ALL available fields.
Examples include:
Account ID
Account Name
Parent Account
Website
Phone
Billing Address
Billing City
Billing State
Billing Zip
Geography
Company Category
Specialty
Type
Buyer Profile
Additional Profile Notes
Important Notes
Revenue / Size
Amount of Capital Available
Source of Funding
Healthcare Experience
Current Healthcare Holdings
Salesperson
Account Owner
Account Owner Alias
Created Date
Modified Date
Last Activity Date
Account History
Store complete Buyer Profile text.
Store complete Notes fields.
No truncation.
2. Buyer Contacts
Capture ALL contact records.
Examples:
Contact ID
Account ID
Name
Title
Email
Phone
Mobile
Role
Source
Salesman
Created Date
Last Activity
Related Contacts
Preserve all contact relationships.
3. Listings
Capture ALL listing fields.
Examples:
Financial Data
Revenue
EBITDA
Asking Price
Final Sale Price
Fee %
Fee Amount
Transaction Data
Listing Status
Listing Type
Close Date
LOI Execution Date
LOI Expiration Date
Offer Price
Marketing Data
Original Broadcast Date
Rebroadcast Date
Last Targeted Broadcast
Last Book Follow-Up
Date of Most Recent Financials
Seller Data
Seller Name
Seller Email
Seller Mobile
Seller Address
Notes
Important Notes
Approvals & Exceptions
Cancellation Reason
Post-Cancellation Follow-Up Notes
Payment Notes
Capture all raw text.
No truncation.
4. Opportunities Connected
Capture every opportunity connected to every listing.
Examples:
Opportunity ID
Listing ID
Listing Title
Buyer Account
Primary Contact
Stage
Status
Notes
Internal Notes
Created Date
Close Date
Salesperson
Created By Alias
This table will become the core buyer activity dataset.
5. Activity History
Capture all activities including:
Emails
Calls
Meetings
Follow-Ups
Tasks
Fields:
Subject
Date
Assigned To
Related To
Contact
Notes
6. Listing History
Capture all listing history records.
Examples:
Price Changes
Status Changes
Buyer Changes
LOI Events
Sale Events
Payment Events
Store:
Date
User
Action
Old Value
New Value
REQUIRED CLASSIFICATION ENGINE
Developer should create derived tags where possible.
Buyer Type
Private Equity
Strategic Buyer
Search Fund
Independent Sponsor
Family Office
Operator
Broker
Healthcare Focus
Home Health
Hospice
Behavioral Health
Dental
Imaging
Physician Practice
Other
Geographic Focus
State
Region
National
Activity Status
Book Sent
CA Sent
CA Executed
Pending Approval
Conference Call
Due Diligence
LOI
Closed Won
Closed Lost
ANALYTICS REQUIREMENTS
Database should support:
Buyer Velocity
Listings Received
Books Sent
CAs Executed
Conference Calls
LOIs Submitted
Deals Closed
Buyer Activity Score
Recency
Frequency
Conversion Rate
Industry Focus
Track buyer activity by industry including but not limited to:
Home Health
Hospice
Behavioral Health
Dental
Geographic Focus
Track buyer activity by:
State
Region
National
High
Recent Activity
Active
What I'd Add to the JD
Data Warehouse & CRM Requirements
Developer will create a structured PostgreSQL/Supabase database that serves as the system of record for EVI.
Database will store:
Companies
Contacts
Listings
Opportunities
Activities
Listing History
Account History
Buyer Intelligence Scores
Classification Tags
Supporting:
Buyer ranking
Buyer matching
Buyer outreach
Market intelligence
AI recommendations
Future SaaS applications
Preferred Skills
Required
API
Bulk API
SOQL
Python
PostgreSQL
ETL Development
Data Engineering
If API Access Is Not Available
Experience with:
Apify
Playwright
Browser Automation
Salesforce Scraping
Session Management
Large Scale Data Extraction
Proposal Requirements
Please answer:
Would you attempt CRM API/Bulk API extraction first? Why?
Have you extracted custom objects before?
Have you built buyer intelligence databases or platforms before?
Have you worked with datasets exceeding 100,000 records?
What architecture would you recommend for this project?
Öppna på Upwork