Founding AI Research Engineer (Multimodal AI)
Бюджет: $10.0 - $25.0
HOURLY / PART_TIME
⭐ 0.00 (0)
South Korea
artificial-intelligence, machine-learning, python, deep-learning
I am building a long-term development initiative focused on next-generation multimodal AI systems capable of understanding and generating:
Text
Images
Audio
Video
3D Spatial Data
The ultimate goal is to develop advanced multimodal foundation models and embodied/spatial AI systems inspired by technologies such as GPT-4o, Gemini, Qwen-Omni, LLaVA, VideoLLaMA, Depth Anything, MASt3R, and Loc3R.
I am seeking a highly motivated AI Engineer or Research Engineer to work closely with me as a technical partner throughout this journey.
Responsibilities
Implement and reproduce state-of-the-art AI papers
Build training and evaluation pipelines in PyTorch
Develop multimodal AI systems integrating text, image, audio, and video understanding
Work on Vision-Language Models (VLMs)
Explore spatial reasoning and 3D perception systems
Conduct literature reviews and benchmark analysis
Design experiments and analyze model performance
Contribute to open-source research projects
How to Apply
Please send:
GitHub profile
Relevant AI/ML projects
Papers reproduced or implemented
Resume or LinkedIn profile
Brief explanation of why this project interests you
Candidates who have successfully implemented or reproduced models such as CLIP, LLaVA, ViT, Whisper, Depth Anything, or similar research projects are especially encouraged to apply.
Открыть заказ