Pony.ai

分布式训练平台工程师 - 广州南沙

Pony.ai  •  Onsite  •  2 months ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

分布式训练平台工程师 - 广州南沙北京、广州、上海全职职位描述1. 负责分布式训练平台的性能优化与稳定性提升;
2. 优化多机多卡训练吞吐与资源利用率(GPU/CPU/网络/存储);
3. 定位并解决通信瓶颈、显存压力、长尾卡顿等问题;
4. 维护与演进训练调度/资源管理系统及相关基础设施;
5. 建设性能基线、监控与告警体系,提升可观测性与问题响应效率;职位要求1. 计算机相关专业本科及以上;
2. 熟悉 PyTorch 分布式训练(DDP/FSDP 至少其一);
3. 熟悉 GPU 训练栈(CUDA、NCCL);
4. 有实际性能优化经验(profiling、通信/IO/算子优化);
5. 具备良好的工程实践与故障排查能力;
加分项
- 有 FSDP 深度使用经验(sharding 策略、混合精度、activation checkpointing 等);
- 熟悉调度系统(K8s/Slurm/自研);
- 有大规模训练集群运维或优化经验; 投递
Pony.ai

About Pony.ai

Pony AI Inc. (“Pony.ai”) is a global leader in the large-scale commercialization of autonomous mobility.

Leveraging its vehicle-agnostic Virtual Driver technology, full-stack autonomous driving technology that seamlessly integrates its proprietary software, hardware, and services, Pony.ai is developing a commercially viable and sustainable business model that enables the mass production and deployment of vehicles across transportation use cases.

Founded in 2016, Pony.ai has expanded its presence across China, Europe, East Asia, the Middle East, and other regions, ensuring widespread accessibility to its advanced technology.

Pony.ai is among the first in China to obtain licenses to operate fully driverless vehicles in all four Tier-1 cities in China (Beijing, Guangzhou, Shanghai, Shenzhen) and has begun to offer public-facing, fare-charging robotaxi services without safety drivers in Beijing, Guangzhou and Shenzhen. Pony.ai operates a fleet consisting of over 250 robotaxis.

To date, Pony.ai has driven nearly 45 million autonomous testing and operation kilometers on open roads worldwide.

Industry
IT & Software
Company Size
501-1,000 employees
Headquarters
Fremont, California
Year Founded
2016
Website
pony.ai
Social Media