Founding AI Research Engineer (Multimodal AI)

Budget: $10.0 - $25.0 HOURLY / PART_TIME ⭐ 0.00 (0) South Korea

artificial-intelligence, machine-learning, python, deep-learning

I am building a long-term development initiative focused on next-generation multimodal AI systems capable of understanding and generating: Text Images Audio Video 3D Spatial Data The ultimate goal is to develop advanced multimodal foundation models and embodied/spatial AI systems inspired by technologies such as GPT-4o, Gemini, Qwen-Omni, LLaVA, VideoLLaMA, Depth Anything, MASt3R, and Loc3R. I am seeking a highly motivated AI Engineer or Research Engineer to work closely with me as a technical partner throughout this journey. Responsibilities Implement and reproduce state-of-the-art AI papers Build training and evaluation pipelines in PyTorch Develop multimodal AI systems integrating text, image, audio, and video understanding Work on Vision-Language Models (VLMs) Explore spatial reasoning and 3D perception systems Conduct literature reviews and benchmark analysis Design experiments and analyze model performance Contribute to open-source research projects How to Apply Please send: GitHub profile Relevant AI/ML projects Papers reproduced or implemented Resume or LinkedIn profile Brief explanation of why this project interests you Candidates who have successfully implemented or reproduced models such as CLIP, LLaVA, ViT, Whisper, Depth Anything, or similar research projects are especially encouraged to apply.

Ouvrir sur Upwork

AI proposal draft

Generate a short cover letter for this job. Edit before sending.

Connexion