Job Description

The Applied Machine Learning (AML) - Enterprise team provides machine learning platform products on VolcanoEngine with cloud resource scheduling system which intelligently orchestrates different tasks and jobs with minimised costs of every experiment and maximised resource utilisation, rich modelling tools including customised machine learning tasks and web IDE, and multi-framework high performance model inference services.

1. Ensure the reliability and normal operation of multiple core systems related to Viking Team's Big data and online services, while focusing on system capacity planning and stability assurance;
2. Enhance system visibility by monitoring the availability and performance metrics of system components, helping development teams quickly locate faults, and especially ensuring the stability in critical links such as AI search/vector databases;
3. Improve the reliability, scalability, and Performance optimization of services to ensure the achievement of the core system SLA;
4. Participated in the design and implementation of the automation platform, ensuring the rapid iteration and efficient operation and maintenance of large-scale online Viking clusters and AI search-related clusters;
5. Combining with the usage scenarios of AI Search/Viking business, in-depth optimization of service governance practices, including but not limited to analysis of performance bottlenecks in key AI Search/Viking links, business problem location and troubleshooting, promoting the transformation and upgrading of the system's high-availability architecture, and those familiar with Viking-related technologies are preferred to participate in core optimization work.

About ByteDance

ByteDance is a global incubator of platforms at the cutting edge of commerce, content, entertainment and enterprise services - over 2.5bn people interact with ByteDance products including TikTok.

Creation is the core of ByteDance's purpose. Our products are built to help imaginations thrive. This is doubly true of the teams that make our innovations possible.

Together, we inspire creativity and enrich life - a mission we aim towards achieving every day. At ByteDance, we create together and grow together. That's how we drive impact - for ourselves, our company, and the users we serve. We are committed to building a safe, healthy and positive online environment for all our users.

We have over 110,000 employees based in more than 30 countries globally. Join us.

Industry

IT & Software

Company Size

10,000+ employees

Headquarters

China, CN

Year Founded

Unknown

Website

bytedance.com

Social Media

Site Reliability Engineer - AI Application

Job Description

About ByteDance