← Missions

Production pipeline reliability

Budget: $2500.0 FIXED / ⭐ 4.29 (18) United Arab Emirates

kubernetes, python, system-automation, orchestration, software-debugging

We run an AI document-processing system in production. It ingests documents, runs a structured-extraction pipeline, and from the extracted data it generates articles and dashboards. The pipelines are orchestrated on a container platform. The system is split across multiple repositories, microservices style, so the stalls can cross service boundaries. The problem: these pipelines can stall in production. Jobs stop making progress and the flow halts until someone intervenes manually. We need them to run unattended and reliably. Scope, and only this: make the structured extraction, article generation, and dashboard generation pipelines never get stuck in production. Find why they stall, fix it, and prove it holds. Nothing else is in scope. We will share what we have observed about the stalls once you start. Doing this properly is not just surface patches. It may require strengthening the business logic and the orchestration behind these pipelines. We expect real fixes to root causes, not workarounds that hide the symptom. How the project works: - Two milestones, published on Upwork. - Milestone 1, setup and framework understanding: get set up across the multiple repositories, get access, and demonstrate a working understanding of the architecture and the orchestration. - Milestone 2, the actual work: make the pipelines stall-free in production. Stability will be evaluated by multiple pipeline runs across the following week. This milestone is passed only if no pipeline stalls across those runs. If any run still stalls, it is not met, and the fix continues until it holds. - 5 days maximum. - We require daily updates on Slack, with short check-in calls on Google Meet. If this milestone is passed successfully, we may continue with a longer term commitment. You should be strong in production debugging of orchestrated data pipelines, backend services, and the infrastructure they run on, and comfortable proving reliability rather than just patching symptoms. No agencies and no project managers. We want the engineer who will do the work, directly. The selected developer signs an NDA before commencing work. To apply, send us a short video where you address the questions, and show examples of your previous work, in particular work related to what we need here. Applications written with AI will be rejected automatically. If we are happy with your application, there is a quick follow up interview.
Ouvrir sur Upwork