DevOps Engineer for Voice AI Platform (Uptime Monitoring & Reliability)
Budget: $20.0 - $45.0
HOURLY / FULL_TIME
⭐ 4.90 (78)
Germany
devops, amazon-web-services, github
We are looking for an experienced **DevOps Engineer** to take ownership of the reliability, monitoring, and operational stability of our Voice AI product **Klara AI** ([www.klara-ai.de](http://www.klara-ai.de)).
Klara AI is a production-ready Voice AI used in recruiting workflows and is **integrated with Recruitee**. The system is already live; this is **not a greenfield project**. Your task is to ensure professional-grade uptime, monitoring, and operational excellence comparable to **[https://status.close.com/](https://status.close.com/)**.
---
### Scope of Work
**Primary Responsibilities**
* Design and implement a **robust uptime and incident monitoring system** (public status page preferred)
* Proactively **monitor system health, failures, latency, and API availability**
* Set up **alerts, logging, and escalation processes**
* Ensure **high availability and reliability** across all critical services
* Ongoing **DevOps supervision** of production systems
**Secondary Responsibilities (Nice to Have)**
* Minor **bug fixes** in backend (Python) and frontend (ReactJS)
* Support CI/CD workflows and GitHub-based versioning
---
### Technical Stack
* **Backend:** Python
* **Frontend:** ReactJS
* **Infrastructure:** Amazon AWS (Region: Europe / Frankfurt)
* **AI / Voice Stack:**
* OpenAI (LLM)
* VAPI
* Twilio (calling)
* ElevenLabs (voice synthesis)
* **Version Control:** GitHub (fully versioned)
---
### Required Skills
* Strong **DevOps experience on AWS**
* Proven hands-on experience with **VAPI** and **Twilio**
* Solid understanding of **cloud monitoring, logging, and alerting**
* Experience building **status dashboards / uptime pages**
* Familiarity with **Python** and **ReactJS** for minor fixes
* Production mindset: reliability, security, and scalability
---
### What We Expect
* Independent, structured work
* Clear documentation of monitoring and alerting setup
* Focus on **stability, uptime, and prevention**, not firefighting
* Long-term availability for system supervision is a plus
---
### Project Type
* Initial setup project
* Ongoing collaboration possible if performance and reliability standards are met
If you have built or maintained **mission-critical SaaS or Voice AI systems**, this project is a strong fit.
Please include **relevant DevOps / monitoring examples** in your proposal.
Öppna på Upwork