Job Description
About the Team
PICO-MR team is dedicated to pioneering core technologies for intelligent human-computer interaction in MR environments, with a focus on integrating multimodal large language models (MLLM) and tool-use capabilities to redefine user experiences. Our R&D directions cover cutting-edge fields including multimodal scene understanding, MLLM-based agent systems, tool-augmented MR interaction, 3D environment perception, and AIGC-driven content generation. Within MR scenarios, our work spans: MLLM optimization and adaptation for MR, intelligent task execution with tool use, multimodal scene understanding (vision, point clouds, text), AIGC-based scene generation, depth estimation (Mono/Stereo/MVS), 3D environment perception, large-scale 3D scene reconstruction (3DGS, NeRF, etc.), visual localization, and lighting estimation—encompassing both fundamental research breakthroughs and industrial-grade solution deployment.
Responsibilities:
1. Lead the R&D of multimodal large language models (MLLM) tailored for MR scenarios, integrating vision, point clouds, text, and other multimodal information—including model architecture optimization, cross-modal alignment, data construction, evaluation system enhancement, and end-to-end training/inference acceleration.
2. Drive the research and implementation of MLLM tool-use capabilities in MR environments, enabling models to proficiently utilize spatial interaction and spatial computing-related professional tools, support tool calls for both single-turn and multi-turn conversations, and solve complex user tasks through interaction.
3. Address key challenges in long-horizon, multi-turn tool-augmented tasks in MR, such as context memory management, tool selection strategy, and error correction mechanisms.
4. Keep abreast of cutting-edge technologies in MLLM, multimodal intelligence, and tool-use research, and lead the application and deployment of innovative technologies in PICO's MR products.
5. Collaborate with cross-functional teams (including software engineering, product design, and hardware development) to translate research outcomes into practical features that enhance user experience.
The base salary range for this position in the selected city is $212800 - $450000 annually.