ByteDance

Senior Research Scientist (Multimodal Large Language Model) - PICO

ByteDance  •  $213k - $450k/yr  •  San Jose, CA (Onsite)  •  18 days ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

About the Team
PICO-MR team is dedicated to pioneering core technologies for intelligent human-computer interaction in MR environments, with a focus on integrating multimodal large language models (MLLM) and tool-use capabilities to redefine user experiences. Our R&D directions cover cutting-edge fields including multimodal scene understanding, MLLM-based agent systems, tool-augmented MR interaction, 3D environment perception, and AIGC-driven content generation. Within MR scenarios, our work spans: MLLM optimization and adaptation for MR, intelligent task execution with tool use, multimodal scene understanding (vision, point clouds, text), AIGC-based scene generation, depth estimation (Mono/Stereo/MVS), 3D environment perception, large-scale 3D scene reconstruction (3DGS, NeRF, etc.), visual localization, and lighting estimation—encompassing both fundamental research breakthroughs and industrial-grade solution deployment.

Responsibilities:
1. Lead the R&D of multimodal large language models (MLLM) tailored for MR scenarios, integrating vision, point clouds, text, and other multimodal information—including model architecture optimization, cross-modal alignment, data construction, evaluation system enhancement, and end-to-end training/inference acceleration.
2. Drive the research and implementation of MLLM tool-use capabilities in MR environments, enabling models to proficiently utilize spatial interaction and spatial computing-related professional tools, support tool calls for both single-turn and multi-turn conversations, and solve complex user tasks through interaction.
3. Address key challenges in long-horizon, multi-turn tool-augmented tasks in MR, such as context memory management, tool selection strategy, and error correction mechanisms.
4. Keep abreast of cutting-edge technologies in MLLM, multimodal intelligence, and tool-use research, and lead the application and deployment of innovative technologies in PICO's MR products.
5. Collaborate with cross-functional teams (including software engineering, product design, and hardware development) to translate research outcomes into practical features that enhance user experience.

The base salary range for this position in the selected city is $212800 - $450000 annually.
ByteDance

About ByteDance

ByteDance is a global incubator of platforms at the cutting edge of commerce, content, entertainment and enterprise services - over 2.5bn people interact with ByteDance products including TikTok.

Creation is the core of ByteDance's purpose. Our products are built to help imaginations thrive. This is doubly true of the teams that make our innovations possible.

Together, we inspire creativity and enrich life - a mission we aim towards achieving every day. At ByteDance, we create together and grow together. That's how we drive impact - for ourselves, our company, and the users we serve. We are committed to building a safe, healthy and positive online environment for all our users.

We have over 110,000 employees based in more than 30 countries globally. Join us.

Industry
IT & Software
Company Size
10,000+ employees
Headquarters
China, CN
Year Founded
Unknown
Social Media