AI Engineer to Build a 100% AI UGC Video Generation Platform

Buget: $1000.0 FIXED / ⭐ 4.75 (4) Italy

artificial-intelligence, python

We're looking for an AI / generative video specialist to build a self-hosted platform (running on our own server) that generates fully AI UGC videos, starting from an avatar we create ourselves and producing spoken clips in Italian with correct, properly synchronized lip sync. We operate in the supplements and beauty sector, so our videos often feature physical products on camera. It is critical that the product label stays intact, legible, and undistorted throughout AI generation — no warped text, garbled logos, or mangled packaging. This is a hard requirement. We already have active, working accounts on the main services (Seedance, fal.ai, ElevenLabs, and others). The platform must connect to these via API — not rebuild everything from scratch. Important — what already works: the Italian voice generated by ElevenLabs works very well for us. That part of the pipeline is solid, and we want to keep using it. Each avatar will have its own ElevenLabs voice code, which the platform should use as the starting point for generation. What the platform must do Maintain an internal library of avatars (5–6, no more) that we create ourselves using Nano Banana (reference images of the characters). Generation should always start from one of these avatars, paired with its corresponding ElevenLabs voice code. Generate multiple videos in parallel, where each subsequent scene starts from the last frame of the previous scene, to maintain visual consistency and character continuity. When a product appears in the scene, the label/packaging must remain accurate and readable (correct text, logo, and design) across every generated frame — including scenes generated from the last frame of the previous one, where drift tends to accumulate. Automatically stitch the individual scenes (roughly 5–6 seconds each) into a single final video. Desired output: full AI videos lasting between 30 and 60 seconds (60 max). Maintain correct Italian dialogue, perfectly synchronized with the lip sync. Critical point — ITALIAN LIP SYNC We'll say it twice: the lip sync must work in ITALIAN. This is the core problem — so far we have not been able to get a decent result. Working Italian lip sync is the requirement that determines whether this project succeeds. If you don't have concrete experience with Italian lip sync (or with solving model limitations on non-English languages), this project is not for you. Architecture / integrations Platform hosted on our own server (no closed third-party SaaS). API integration with the services we already use: Seedance, fal.ai, ElevenLabs, and our other active accounts. Orchestrated pipeline: avatar (from internal library) + ElevenLabs Italian voice → parallel scene generation with frame-to-frame continuity → lip sync → automatic editing → export. We're open to suggestions We're not locked into a specific technical approach. We're open to any suggestions on tools, models, or architecture that will make the platform work reliably — especially anything that improves Italian lip sync quality and keeps product labels stable. If you know a better way to achieve our goal, tell us. Deliverables A working platform, installed on our server. Technical documentation (setup, APIs, configuration). A reproducible pipeline: from avatar to final edited video. At least 2–3 test videos demonstrating working Italian lip sync at 30–60 seconds

Deschide pe Upwork