Job Description
The AI Data and Safety team plays a critical role in advancing Seed's foundational models, AI products across modalities, and improving AI-native applications built on the Seed model series. We work across the data lifecycle, from defining evaluation approaches, translating user feedback and benchmark outcomes into data requests, to building scalable processes that improve data quality and support rapid model iteration.
Our team combines technical and operational capabilities, bringing together multidisciplinary and multilingual talent across product management, data engineering, and data operations. Our work is driven by people who think deeply about model behavior, move quickly to solve complex problems, and bring first-hand experience as both builders and users of models and agents.
In close partnership with internal researchers, industry experts, and leading data vendors, we tackle challenging data problems at the frontier of AI development, helping improve both model performance and user experience.
As more large AI models are being developed, high-quality data has become the core fuel driving the leap in model capabilities. Our team – AI Data & Safety – Data Annotation and Evaluation Operations – is the builder and operator of this critical link.
Our value lies in transforming the "intelligence" scattered across individual experience and organizational knowledge into "data" that models can understand and learn from by establishing a scientific and efficient data production and operation system. This directly drives the iteration of model capabilities and the implementation of various AI applications. We serve not only as key infrastructure supporting ByteDance’s AI strategy, but also as the core bridge connecting human wisdom and machine intelligence.
Our specific responsibilities include providing data training, model evaluation, and model operation for ByteDance’s large model business, driving continuous improvement and application of model capabilities.
As a project intern, you will have the opportunity to engage in impactful short-term projects that provide you with a glimpse of professional real-world experience. You will gain practical skills through on-the-job learning in a fast-paced work environment and develop a deeper understanding of your career interests.
Applications will be reviewed on a rolling basis - we encourage you to apply early.
We support credit-bearing internship registration, subject to the intern's school requirements and company approval.
Your Role Will Involve:
1. Support model evaluation, training, and user-growth related projects, ensuring that objectives and quality standards are achieved in time. Identify risks and propose corrective actions as required to keep projects on track.
2. Establish and maintain strong relationships with product managers, project owners, researchers, and other external collaborators. Communicate project updates and bottlenecks in a timely fashion to ensure prompt follow-up by project owners or project managers.
3. Develop code-based tools for diverse project-related purposes, such as automating key processes, conducting data analysis, and converting file types and formatting to meet the specific requirements of various platforms.
4. Support general annotation operation improvement initiatives across multiple data domains such as Reasoning, General Knowledge, STEM, and Humanities. Create and maintain technical guidelines and casebooks to support consistent and high-quality data production from external parties.
5. Analyze annotation quality, model performance, and dataset coverage through statistical, visual, and programmatic methods. Employ tools like Python (Pandas, NumPy, Matplotlib) and SQL to generate actionable insights, monitor the health of the data pipeline, and support model training operations. Collaborate with model trainers and researchers to inform training strategies and guide data-centric iterative improvements.