Full-Stack Python + Web Developer for Voice-AI Prototype (FastAPI & Firebase)
Buget: $1000.0
FIXED /
⭐ 0.00 (0)
Pakistan
python, api, api-integration, web-application
### Job Title:
Full-Stack Python + Web Developer for Secure Voice-AI Prototype (FastAPI & Firebase)
### Project Overview:
We are seeking an experienced Independent Full-Stack Developer to build a lightweight, highly optimized web browser prototype for a voice-interaction companion application. The core engine links browser-based audio capture with the Google GenAI SDK (Gemini) and Google Cloud Firestore.
The primary objective of this architecture is long-term conversational memory with a flat, net-zero data storage footprint, utilizing an automated text-compaction loop.
### Core Technical Scope:
1. **Frontend Web UI:** A minimalist, clean web page featuring a secure login/signup screen (Firebase Auth) and a central chat dashboard with a prominent "Hold to Speak" microphone button.
2. **Browser Microphone Handling:** Implement JavaScript (MediaRecorder API) to capture user voice input directly through the browser, package it cleanly as a lightweight payload, and stream it to the backend.
3. **Multi-Tenant Cloud Database:** Configure a Google Firebase Firestore schema with strict customer data isolation paths (/clients/client_id) protected by active security rules.
4. **AI Processing & Context Caching:** Integrate the official google-genai SDK using the gemini-2.5-flash-lite model. Implement Context Caching on system instructions to minimize recurring token ingestion overhead.
5. **Memory Compaction Loop:** Develop a background function that executes post-interaction. It must look at old text summaries + the new chat transcript, generate a newly merged, highly condensed 4-sentence profile text block, and overwrite the client's database file, completely discarding raw transcripts.
6. **Real-Time News Grounding:** Enable the native google_search tool parameter within the Gemini configuration code block to allow live global and national news headline retrieval when prompted by the user's voice stream.
7. **Spotify Intent Routing:** Implement Function/Tool Calling. When the user requests a song, genre, or artist, Gemini must output a structured JSON tool directive containing a Spotify search deep link. The web frontend browser tab must capture this link and automatically open the Spotify platform loop.
8. **Vocal Output Integration:** Route Gemini's response text to a Text-to-Speech API (such as ElevenLabs or Google Cloud TTS) and stream the resulting audio file back to the browser for automatic playback.
### Developer Requirements:
- Strong proficiency in Python (FastAPI or Flask) and Firebase/Firestore architecture.
- Hands-on experience integrating the official Google GenAI SDK and third-party voice APIs (ElevenLabs/Deepgram/Whisper).
- Ability to write secure, isolated database rules.
- Excellent, clear communication skills in plain English.
- Willingness to sign a standard mutual NDA before full project documentation or branding is revealed.
### Project Type & Budget:
- One-time project
- Experience Level: Intermediate
- Budget: Fixed-Price (Milestone-Based) – $1,000 to $1,500
### Milestone Delivery Plan:
- Milestone 1 ($300): Google Firebase Project setup, secure client-isolated Firestore rules deployed, and standard frontend login/chat layout built.
- Milestone 2 ($500): Gemini 2.5 Flash-Lite API linked with active Context Caching, native Google Search grounding active, and the automated background text-summarization loop fully functional via mock text scripts.
- Milestone 3 ($400 - $700): JavaScript browser microphone recording linked to the server, Spotify tool calling integrated, ElevenLabs voice engine connected, and successful deployment to a live testing link (Render or Heroku).
Deschide pe Upwork