Senior Search Architect (Lucene / Elasticsearch / OpenSearch / Algolia) – 200 Million Resumes
Költségvetés: $50.0 - $200.0
HOURLY / PART_TIME
⭐ 4.93 (233)
United States
lucene-search, elasticsearch, python, api
# Senior Search Architect (Lucene / Elasticsearch / OpenSearch / Vector Search) – Build a 200M Candidate Search Platform
# IMPORTANT – Please Read Before Applying
## We are building a search platform comparable to:
* **Juicebox.ai**
* **Exa.ai**
* **LinkedIn Recruiter**
* **LinkedIn Sales Navigator**
If you have built search platforms with similar complexity and scale, we would like to speak with you.
**Please do NOT apply if you are planning to learn these technologies during this project.**
We are specifically looking for someone who has already designed and built large-scale production search systems and can demonstrate previous work.
---
# Overview
We are building a next-generation AI-powered recruiting platform capable of searching **200 million resumes and LinkedIn profiles**.
This is **not** a standard Elasticsearch implementation.
We need an experienced Search Architect who has built production systems capable of:
* Hundreds of millions of searchable documents
* Complex recruiter-style faceted search
* Sub-second response times
* Extremely fast facet counts
* High relevance ranking
* Horizontal scalability
Our goal is to create the best recruiter search platform in the industry.
---
# About the Data
We currently have approximately **200 million resumes and LinkedIn profiles**, all stored in JSON format.
The platform should support continuous ingestion and updates while maintaining excellent search performance.
---
# Responsibilities
Design the overall search architecture and recommend the best technology stack.
Responsibilities include:
* Search architecture
* Data modeling
* Index design
* Ranking strategy
* Relevance tuning
* Faceted search
* Distributed indexing
* Cluster architecture
* Performance optimization
* Capacity planning
* Scalability planning
---
# Search Features
The search experience should feel similar to LinkedIn Recruiter and Juicebox.ai.
Recruiters should be able to search using combinations of:
## Companies
* Current company
* Previous companies
* Multiple companies
* Include companies
* Exclude companies
## Job Titles
* Current title
* Previous titles
* Multiple titles
* Include
* Exclude
## Skills
* Multiple skills
* Required skills
* Excluded skills
* Skill normalization
* Synonyms
## Location
* Country
* State
* City
* ZIP Code
* Radius search (5–100 miles)
* Multiple cities
* Multiple states
## Education
* Schools
* Universities
* Ivy League filter
* Degrees
* Multiple schools
## Experience
* Total years of experience
* Years at current company
* Years in industry
* Years in specific role
## Additional Filters
* Industry
* Certifications
* Security clearance
* Employment type
* Languages
* Work authorization
* Remote / Hybrid / Onsite
* Seniority
* Management level
Most filters should support:
* Include
* Exclude
* Multiple values
* Boolean AND / OR logic
* Phrase search
* Exact search
* Prefix search
* Fuzzy matching
* Synonyms
---
# Search Performance Requirements
The platform should support:
* Sub-second search response
* Sub-second facet counts
* Fast aggregations
* Fast autocomplete
* Instant suggestions
* Efficient deep pagination
* Millions of searches per day
* Horizontal scalability
* High availability
* Fault tolerance
---
# AI & Semantic Search
We intend to use open-source LLMs to enrich resume data before indexing.
Possible enrichment includes:
* Skill extraction
* Skill normalization
* Company normalization
* Industry classification
* Job title normalization
* Resume classification
* Candidate summarization
We are also open to implementing:
* Vector embeddings
* Semantic search
* Hybrid keyword + vector search
* Retrieval-augmented ranking
We are interested in your recommendations based on production experience.
---
# Technology
We are **not committed to any specific technology stack**.
We are open to using whichever technologies provide the best scalability, relevance, maintainability, and performance.
Examples include:
### Search
* Lucene
* Elasticsearch
* OpenSearch
* Algolia
* Solr
* Vespa
* Typesense
### Vector Search
* Weaviate
* Qdrant
* Milvus
* Pinecone
* pgvector
* Redis Search
* FAISS
* HNSW
### Databases
We are open to using any database architecture if it makes sense for the solution, including:
* PostgreSQL
* MongoDB
* ClickHouse
* Cassandra
* ScyllaDB
* DynamoDB
* Neo4j
* Other specialized databases
### APIs
Experience with:
* GraphQL
* REST APIs
is highly desirable.
### Graph Technologies
Knowledge of:
* Neo4j
* Graph databases
* Relationship graphs
is a significant plus for future recruiter relationship intelligence features.
---
# What We're Looking For
You should have real-world experience with:
* 100M+ indexed documents
* Elasticsearch/OpenSearch clusters
* Lucene internals
* Distributed indexing
* Large-scale faceted search
* Search relevance tuning
* Custom analyzers
* Synonyms
* Ranking algorithms
* Aggregations
* Performance optimization
* Capacity planning
* High-volume production systems
Experience building recruiting, staffing, ATS, CRM, people search, enterprise search, e-commerce, or internet-scale search products is highly preferred.
---
# Mandatory Requirements
Please **do not apply** unless you have built a comparable production search platform.
Your proposal should include:
* The largest search system you have built
* Approximate number of indexed documents
* Number of daily searches
* Search technology used
* Database(s) used
* Whether embeddings/vector search were used
* Cluster size
* Average query latency
* Average facet latency
* Your role on the project
* Team size
---
# Interview Requirements
You must be able to demonstrate previous work.
Examples include:
* Live demo
* Recorded demo
* Screenshots
* Architecture diagrams
* Code walkthrough
* Performance dashboards
You should be comfortable discussing:
* Architecture decisions
* Indexing strategy
* Ranking
* Relevance tuning
* Scaling strategy
* Search optimization
* Cluster management
* Performance bottlenecks
---
# Please Do Not Apply If
* You have only built small Elasticsearch projects.
* You have only taken Elasticsearch courses.
* You have never managed a production search cluster.
* You plan to learn these technologies while working on this project.
* You cannot demonstrate similar previous work.
---
# Long-Term Opportunity
This is expected to become a long-term technical leadership role.
We are looking for someone who can own the search architecture, mentor our engineering team, and help us build one of the world's most capable AI-powered recruiting search platforms.
If you have built search systems comparable to **Juicebox.ai, Exa.ai, LinkedIn Recruiter, LinkedIn Sales Navigator, HireEZ, SeekOut, ZoomInfo, Apollo, Indeed, or similar internet-scale search platforms**, we'd like to hear from you.
Megnyitás Upworkön