ByteDance

Research Scientist - Data Center AIOps & Infrastructure - Global Frontier Tech Recruitment Program - 2027 Start (PhD)

ByteDance  •  Singapore, SG (Onsite)  •  10 days ago
Apply
AI can make mistakes so check important info. Chat history is never stored.
81
AI Success™

Job Description

We are looking for talented individuals to join our team in 2027. As a graduate, you will get opportunities to pursue bold ideas, tackle complex challenges, and unlock limitless growth. Launch your career where inspiration is infinite at our Company.
Successful candidates must be able to commit to an onboarding date by end of year 2027. Please state your availability and graduation date clearly in your resume.

Team Introduction: The team is responsible for technical planning and management of built-to-suit and leased data center projects in Asia Pacific region, data center infrastructure technology standardisation and research and development, as well as promoting green energy and achieving carbon neutrality as part of our global roadmap.

Topic Content: As intelligent computing and the AIDC industry develop rapidly, data center rack power density continues to rise. Traditional infrastructure struggles with heat exchange efficiency, water-saving, and energy use. At the same time, large amounts of maintenance data are underused, and maintenance relies heavily on human experience, making it hard to meet the industry's quality and policy demands. This topic focuses on two main areas: technology innovation and smart operation. First, it aims to innovate in liquid cooling, water-saving or water-free cooling, and power supply and energy storage to overcome current limits, meet high-density power needs, and comply with carbon reduction and water-saving policies. Second, it builds an AI agent system for data center maintenance that uses large AI models to learn from vast, varied multimodal maintenance data. This system enables multiple AI agents to work together to automate everything from monitoring and diagnosis to repair, creating an intelligent loop to optimize the power usage effectiveness (PUE) and shifting maintenance from "reacting after problems" to "predicting and fixing them automatically". By combining new hardware technology with AI-driven maintenance, the project seeks to improve data center energy efficiency, reliability, and operation efficiency.

Topic Challenges:
1. Multi-technology collaborative innovation: Liquid cooling technology must overcome challenges in efficient heat transfer and system reliability. Water-saving cooling sources need to achieve low water-efficiency temperature (WET) and coordinate dry and wet cooling methods. Power supply, distribution, and energy storage must solve source-grid-load-storage matching issues and enhance full-process efficiency. Coordinating innovations across these areas is highly complex.
2. Multi-agent collaboration system design: A multi-agent cooperation framework is required for managing complex operation and maintenance processes, enabling end-to-end autonomous execution from monitoring and diagnosis to automatic repair. Integrating these technologies is highly challenging.
3. Intelligent PUE optimization closed loop: Intelligent control of HVAC and power systems should be based on time sequence prediction and reinforcement learning to surpass human expert-level optimization. This demands advanced algorithms and strong engineering implementation.
4. Root cause analysis of heterogeneous data: Large models must automatically identify fault root causes and build knowledge from massive heterogeneous monitoring data, requiring high model understanding and generalization capability.
5. Technology implementation and adaptation: Hardware innovation must comply with policies and fit industry needs, while AI operations must integrate with existing platforms and tools. Successfully combining and implementing these poses significant challenges.

Topic Value:
1. Solve key technical problems in data centers and increase global competitiveness.
2. Enable water-saving and low-carbon operation, ensuring data centers run efficiently, stably, and in compliance through AI-driven maintenance.
ByteDance

About ByteDance

ByteDance is a global incubator of platforms at the cutting edge of commerce, content, entertainment and enterprise services - over 2.5bn people interact with ByteDance products including TikTok.

Creation is the core of ByteDance's purpose. Our products are built to help imaginations thrive. This is doubly true of the teams that make our innovations possible.

Together, we inspire creativity and enrich life - a mission we aim towards achieving every day. At ByteDance, we create together and grow together. That's how we drive impact - for ourselves, our company, and the users we serve. We are committed to building a safe, healthy and positive online environment for all our users.

We have over 110,000 employees based in more than 30 countries globally. Join us.

Industry
IT & Software
Company Size
10,000+ employees
Headquarters
China, CN
Year Founded
Unknown
Social Media