Z.ai

AI院-强化学习训练框架实习生(slime)

Z.ai  •  Onsite  •  4 months ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

AI院-强化学习训练框架实习生(slime)北京实习互联网 / 电子 / 网游 - 研发职位描述- 负责强化学习训练框架的研发、优化和维护,根据业务需求持续改进训练框架和策略,提升模型训练效率
- 分析和定位训练中的性能瓶颈,实施针对性优化措施,提升训练效率和稳定性
- 跟进业界技术进展,不断同步与集成最新训练优化策略职位要求硕士及以上学历,计算机相关专业,HPC&MLSys 相关研究领域
- 对自然语言处理、计算机视觉和多模态算法有深入理解,熟悉主流的 LLM 模型架构,有分布式训练经验
- 对常见 RL 训练算法有基本了解
- 加分项:熟悉 vllm 或 sglang 等常用开源推理框架
更多信息:团队工作介绍
GLM-4.5: Reasoning, Coding, and Agentic Abililties
- https://z.ai/blog/glm-4.5
- GLM-4.5 is built with 355 billion total parameters and 32 billion active parameters, and GLM-4.5-Air with 106 billion total parameters and 12 billion active parameters. Both are designed to unify reasoning, coding, and agentic capabilities into a single model in order to satisfy more and more complicated requirements of fast rising agentic applications.
slime: An SGLang-Native Post-Training Framework for RL Scaling
- https://lmsys.org/blog/2025-07-09-slime/
- We believe in RL. We believe RL is the final piece toward AGI.
- If you feel the same way, you'll share our vision:
- Every field should be end-to-end RLed and every task should become an agent environment.
- Every RL run should last longer, and every model should scale larger.
- RL systems should integrate seamlessly with existing infrastructure, letting us focus on new ideas instead of boilerplate engineering.
- That's why we present slime, a post-training framework designed to be:
- Versatile – with a fully customizable rollout interface and flexible training setups (colocated or decoupled, synchronous or asynchronous, RL or SFT cold start).
- Performant - integrating SGLang for inference and Megatron-LM for training, natively.
- Maintainable - with a lightweight codebase and smooth transition from Megatron pretraining to SGLang deployment.
In short, a post-training framework for RL scaling.
The journey of RL scaling has just begun, and slime is continuously evolving. In the next phase, we will focus on:
1. Collaborating with the SGLang team to explore optimal RL training strategies for large-scale MoE models.
2. Supporting broader post-training workflows, strengthening the pre-training-to-production bridge. 投递
Z.ai

About Z.ai

Z.ai is the AI company behind the GLM series models, dedicated to inspiring the development of AGI to benefit humanity.

Industry
IT & Software
Company Size
51-200 employees
Headquarters
Beijing, CN
Year Founded
Unknown
Social Media