Tencent

Research Intern — Coding LLMs

Tencent  •  Singapore, SG (Onsite)  •  2 hours ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

Business Unit

Technology Engineering Group (TEG) is responsible for supporting the company and its business groups on technology and operational platforms, as well as the construction and operation of R&D management and data centers, TEG provides users with a full range of customer services. As the operator of the largest networking, devices, and data center in Asia,TEG also leads the Tencent Technology Committee in strengthening infrastructure R&D through internal and distributed open source collaboration, constructing new platforms and supporting business innovation.

What the Role Entails

We are looking for research interns to work on foundational areas for coding language models, including pre-training data, mid-training data, synthetic data generation, evaluation, and agentic coding.

Responsibilities

* Explore data-centric methods for improving coding LLMs, including data filtering, quality assessment, deduplication, data mixture, and diversity analysis.
* Build synthetic data and evaluation pipelines for code generation, code editing, repo-level reasoning, tool use, and multi-step coding tasks.
* Run experiments to analyze how data, model, and training strategies affect coding capabilities.
* Work with large-scale code corpora, developer activity data, and agentic coding trajectories.

Who We Look For

* Strong programming skills in Python.
* Solid understanding of machine learning and large language models.
* Familiarity with LLM pre-training, mid-training, code models, data curation, evaluation, agents, or tool use.
* Strong experiment design, data analysis, and problem-solving skills.
* Interest in code intelligence, software engineering automation, and agentic coding.

Preferred Qualifications

* Experience with code data processing, GitHub-scale data, synthetic data, LLM evaluation, semantic deduplication, or agentic coding.
* Research experience, publications, or open-source projects in related areas are a plus.

What We Offer

* Access to large-scale real-world coding data and agentic trajectories.
* Rich compute resources and model APIs for fast research iteration.
* Opportunities to work on real-world coding model applications and the full model development loop.

Equal Employment Opportunity at Tencent

As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.

Tencent

About Tencent

Tencent is a world-leading internet and technology company that develops innovative products and services to improve the quality of life of people around the world.

Founded in 1998 with its headquarters in Shenzhen, China, Tencent's guiding principle is to use technology for good. Our communication and social services connect more than one billion people around the world, helping them to keep in touch with friends and family, access transportation, pay for daily necessities, and even be entertained.

Tencent also publishes some of the world's most popular video games and other high-quality digital content, enriching interactive entertainment experiences for people around the globe.

Tencent also offers a range of services such as cloud computing, advertising, FinTech, and other enterprise services to support our clients' digital transformation and business growth.

Tencent has been listed on the Stock Exchange of Hong Kong since 2004.

Industry
IT & Software
Company Size
10,000+ employees
Headquarters
Shenzhen, CN
Year Founded
1998
Social Media