← Oferty

Senior Search Architect (Lucene / Elasticsearch / OpenSearch / Algolia) – 200 Million Resumes

Budżet: $50.0 - $200.0 HOURLY / PART_TIME ⭐ 4.93 (233) United States

lucene-search, elasticsearch, python, api

# Senior Search Architect (Lucene / Elasticsearch / OpenSearch / Vector Search) – Build a 200M Candidate Search Platform # IMPORTANT – Please Read Before Applying ## We are building a search platform comparable to: * **Juicebox.ai** * **Exa.ai** * **LinkedIn Recruiter** * **LinkedIn Sales Navigator** If you have built search platforms with similar complexity and scale, we would like to speak with you. **Please do NOT apply if you are planning to learn these technologies during this project.** We are specifically looking for someone who has already designed and built large-scale production search systems and can demonstrate previous work. --- # Overview We are building a next-generation AI-powered recruiting platform capable of searching **200 million resumes and LinkedIn profiles**. This is **not** a standard Elasticsearch implementation. We need an experienced Search Architect who has built production systems capable of: * Hundreds of millions of searchable documents * Complex recruiter-style faceted search * Sub-second response times * Extremely fast facet counts * High relevance ranking * Horizontal scalability Our goal is to create the best recruiter search platform in the industry. --- # About the Data We currently have approximately **200 million resumes and LinkedIn profiles**, all stored in JSON format. The platform should support continuous ingestion and updates while maintaining excellent search performance. --- # Responsibilities Design the overall search architecture and recommend the best technology stack. Responsibilities include: * Search architecture * Data modeling * Index design * Ranking strategy * Relevance tuning * Faceted search * Distributed indexing * Cluster architecture * Performance optimization * Capacity planning * Scalability planning --- # Search Features The search experience should feel similar to LinkedIn Recruiter and Juicebox.ai. Recruiters should be able to search using combinations of: ## Companies * Current company * Previous companies * Multiple companies * Include companies * Exclude companies ## Job Titles * Current title * Previous titles * Multiple titles * Include * Exclude ## Skills * Multiple skills * Required skills * Excluded skills * Skill normalization * Synonyms ## Location * Country * State * City * ZIP Code * Radius search (5–100 miles) * Multiple cities * Multiple states ## Education * Schools * Universities * Ivy League filter * Degrees * Multiple schools ## Experience * Total years of experience * Years at current company * Years in industry * Years in specific role ## Additional Filters * Industry * Certifications * Security clearance * Employment type * Languages * Work authorization * Remote / Hybrid / Onsite * Seniority * Management level Most filters should support: * Include * Exclude * Multiple values * Boolean AND / OR logic * Phrase search * Exact search * Prefix search * Fuzzy matching * Synonyms --- # Search Performance Requirements The platform should support: * Sub-second search response * Sub-second facet counts * Fast aggregations * Fast autocomplete * Instant suggestions * Efficient deep pagination * Millions of searches per day * Horizontal scalability * High availability * Fault tolerance --- # AI & Semantic Search We intend to use open-source LLMs to enrich resume data before indexing. Possible enrichment includes: * Skill extraction * Skill normalization * Company normalization * Industry classification * Job title normalization * Resume classification * Candidate summarization We are also open to implementing: * Vector embeddings * Semantic search * Hybrid keyword + vector search * Retrieval-augmented ranking We are interested in your recommendations based on production experience. --- # Technology We are **not committed to any specific technology stack**. We are open to using whichever technologies provide the best scalability, relevance, maintainability, and performance. Examples include: ### Search * Lucene * Elasticsearch * OpenSearch * Algolia * Solr * Vespa * Typesense ### Vector Search * Weaviate * Qdrant * Milvus * Pinecone * pgvector * Redis Search * FAISS * HNSW ### Databases We are open to using any database architecture if it makes sense for the solution, including: * PostgreSQL * MongoDB * ClickHouse * Cassandra * ScyllaDB * DynamoDB * Neo4j * Other specialized databases ### APIs Experience with: * GraphQL * REST APIs is highly desirable. ### Graph Technologies Knowledge of: * Neo4j * Graph databases * Relationship graphs is a significant plus for future recruiter relationship intelligence features. --- # What We're Looking For You should have real-world experience with: * 100M+ indexed documents * Elasticsearch/OpenSearch clusters * Lucene internals * Distributed indexing * Large-scale faceted search * Search relevance tuning * Custom analyzers * Synonyms * Ranking algorithms * Aggregations * Performance optimization * Capacity planning * High-volume production systems Experience building recruiting, staffing, ATS, CRM, people search, enterprise search, e-commerce, or internet-scale search products is highly preferred. --- # Mandatory Requirements Please **do not apply** unless you have built a comparable production search platform. Your proposal should include: * The largest search system you have built * Approximate number of indexed documents * Number of daily searches * Search technology used * Database(s) used * Whether embeddings/vector search were used * Cluster size * Average query latency * Average facet latency * Your role on the project * Team size --- # Interview Requirements You must be able to demonstrate previous work. Examples include: * Live demo * Recorded demo * Screenshots * Architecture diagrams * Code walkthrough * Performance dashboards You should be comfortable discussing: * Architecture decisions * Indexing strategy * Ranking * Relevance tuning * Scaling strategy * Search optimization * Cluster management * Performance bottlenecks --- # Please Do Not Apply If * You have only built small Elasticsearch projects. * You have only taken Elasticsearch courses. * You have never managed a production search cluster. * You plan to learn these technologies while working on this project. * You cannot demonstrate similar previous work. --- # Long-Term Opportunity This is expected to become a long-term technical leadership role. We are looking for someone who can own the search architecture, mentor our engineering team, and help us build one of the world's most capable AI-powered recruiting search platforms. If you have built search systems comparable to **Juicebox.ai, Exa.ai, LinkedIn Recruiter, LinkedIn Sales Navigator, HireEZ, SeekOut, ZoomInfo, Apollo, Indeed, or similar internet-scale search platforms**, we'd like to hear from you.
Otwórz na Upwork