Job Description
【智谱星】26届校招-AI院-大模型算法工程师-推理模型北京正式互联网 / 电子 / 网游 - 研发职位描述GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilies
We are releasing the latest version of our flagship model: GLM-4.6. Compared with GLM-4.5, this generation brings several key improvements:
Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks.
Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages.
Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability.
More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks.
Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.
We evaluated GLM-4.6 across eight public benchmarks covering agents, reasoning, and coding. Results show clear gains over GLM-4.5, with GLM-4.6 also holding competitive advantages over leading domestic and international models such as DeepSeek-V3.2-Exp and Claude Sonnet 4, but still lags behind Claude Sonnet 4.5 in coding ability.职位要求【工作职责】
1. 对齐数据优化:包括针对模型特定能力进行数据构造、筛选和优化,特定领域(如数学、代码、复杂推理)等数据筛选和合成;
2. 对齐数据的质量和多样性控制等;
3. 后训练扩展性:探索模型如何通过更长的思维链推理,在复杂任务上取得更优的效果,post-training阶段训练和推理的扩展能力;
4. 强化学习算法优化:算法可扩展性和稳定性提升后训练 scaling 性能;多目标奖励模型优化以及结合CoT和过程监督优化奖励模型;
5. 对齐范式探索:结合模型监督、self-improve等进行训练优化探索;
6. 交互任务复杂推理,长文本生成优化;
7. 强化学习框架效率优化:针对 LLM 强化学习训练需求,优化训练速度,开发和研究相关工具支持训练团队效率提升。
【职位要求】
1. 985高校计算机、电子、自动化等相关专业硕士或博士学位(优秀本科生亦可考虑);
2. 深入理解常用的大模型算法,具备后训练及数据处理相关项目经验者优先;
3. 在CCF-A类会议发表过相关论文者优先考虑;
4. 熟练运用Pytorch、transformers、megatron等主流框架;
5. 工作态度认真负责,具备良好的团队协作能力。
【加分项】
1. 在ACL,NeurIPS,ICLR,EMNLP,ICML等顶级会议或期刊上发表过论文者优先;
2. 熟悉并行训练框架,有多机多卡训练经验者优先。 投递