← Trabalhos

Full-Stack Python + Web Developer for Voice-AI Prototype (FastAPI & Firebase)

Orçamento: $1000.0 FIXED / ⭐ 0.00 (0) Pakistan

python, api, api-integration, web-application

### Job Title: Full-Stack Python + Web Developer for Secure Voice-AI Prototype (FastAPI & Firebase) ### Project Overview: We are seeking an experienced Independent Full-Stack Developer to build a lightweight, highly optimized web browser prototype for a voice-interaction companion application. The core engine links browser-based audio capture with the Google GenAI SDK (Gemini) and Google Cloud Firestore. The primary objective of this architecture is long-term conversational memory with a flat, net-zero data storage footprint, utilizing an automated text-compaction loop. ### Core Technical Scope: 1. **Frontend Web UI:** A minimalist, clean web page featuring a secure login/signup screen (Firebase Auth) and a central chat dashboard with a prominent "Hold to Speak" microphone button. 2. **Browser Microphone Handling:** Implement JavaScript (MediaRecorder API) to capture user voice input directly through the browser, package it cleanly as a lightweight payload, and stream it to the backend. 3. **Multi-Tenant Cloud Database:** Configure a Google Firebase Firestore schema with strict customer data isolation paths (/clients/client_id) protected by active security rules. 4. **AI Processing & Context Caching:** Integrate the official google-genai SDK using the gemini-2.5-flash-lite model. Implement Context Caching on system instructions to minimize recurring token ingestion overhead. 5. **Memory Compaction Loop:** Develop a background function that executes post-interaction. It must look at old text summaries + the new chat transcript, generate a newly merged, highly condensed 4-sentence profile text block, and overwrite the client's database file, completely discarding raw transcripts. 6. **Real-Time News Grounding:** Enable the native google_search tool parameter within the Gemini configuration code block to allow live global and national news headline retrieval when prompted by the user's voice stream. 7. **Spotify Intent Routing:** Implement Function/Tool Calling. When the user requests a song, genre, or artist, Gemini must output a structured JSON tool directive containing a Spotify search deep link. The web frontend browser tab must capture this link and automatically open the Spotify platform loop. 8. **Vocal Output Integration:** Route Gemini's response text to a Text-to-Speech API (such as ElevenLabs or Google Cloud TTS) and stream the resulting audio file back to the browser for automatic playback. ### Developer Requirements: - Strong proficiency in Python (FastAPI or Flask) and Firebase/Firestore architecture. - Hands-on experience integrating the official Google GenAI SDK and third-party voice APIs (ElevenLabs/Deepgram/Whisper). - Ability to write secure, isolated database rules. - Excellent, clear communication skills in plain English. - Willingness to sign a standard mutual NDA before full project documentation or branding is revealed. ### Project Type & Budget: - One-time project - Experience Level: Intermediate - Budget: Fixed-Price (Milestone-Based) – $1,000 to $1,500 ### Milestone Delivery Plan: - Milestone 1 ($300): Google Firebase Project setup, secure client-isolated Firestore rules deployed, and standard frontend login/chat layout built. - Milestone 2 ($500): Gemini 2.5 Flash-Lite API linked with active Context Caching, native Google Search grounding active, and the automated background text-summarization loop fully functional via mock text scripts. - Milestone 3 ($400 - $700): JavaScript browser microphone recording linked to the server, Spotify tool calling integrated, ElevenLabs voice engine connected, and successful deployment to a live testing link (Render or Heroku).
Abrir na Upwork