ByteDance

Structured Data Fusion Large Model Researcher-Risk Control-Soaring Star Talent Program

ByteDance  •  Singapore, SG (Onsite)  •  3 months ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

Team Introduction:
The Risk Control R&D Team is dedicated to addressing various challenges posed by malicious activities across ByteDance's products including Douyin and Toutiao. Their work spans multiple domains of risk governance such as content, transactions, traffic, and accounts. By leveraging technologies such as machine learning, multimodal models, and large models, the team strives to understand user behaviors and content, thereby identifying potential risks and issues. By continuously deepening their understanding of business and user behaviors, the team drives innovation in models and algorithms with an aim to build an industry-leading risk control algorithm system.

Project Objectives:
Optimize and enhance large models' ability to understand and reason about structured data (sequential data, graph data) based on risk control data.
Project Necessity:
Data in risk control scenarios is primarily structured, while large models have significantly improved their understanding of text and images. Integrating non-text/image structured data from risk control scenarios with large models to enable better comprehension of structured data remains an industry-wide challenge. This involves three key difficulties:

1. How to effectively align structured information with the NLP semantic space, allowing models to simultaneously understand both data structure and semantic information.
2. How to use appropriate instructions to enable large models to interpret structural information in structured data.
3. How to endow large language models with step-by-step reasoning capabilities for graph learning downstream tasks, thereby inferring more complex relationships and attributes.
Project Content:
Current industry explorations of structured data include:

1. Graph data understanding (e.g., GraphGPT: Enabling large models to read graph data, SIGIR'2024).
2. Graph data RAG (e.g., Microsoft GraphRAG: Unlocking LLM discovery on narrative private data).
3. Sequential data understanding (e.g., StructGPT: A large model reasoning framework for structured data, EMNLP-2023).

However, current efforts mainly focus on understanding single-type structured data, and several challenges remain in risk control scenarios:

1. How to effectively fuse and understand various types of structured data, especially the integration of graph and sequential data.
2. Addressing the challenges mentioned in the ""Project Necessity"" section, particularly the step-by-step reasoning capabilities for downstream tasks, which are currently underexplored—especially reasoning over sequential data.

Research Directions:
1. Large model structured data understanding
2. Large model structured data RAG
3. Large model thought chains
ByteDance

About ByteDance

ByteDance is a global incubator of platforms at the cutting edge of commerce, content, entertainment and enterprise services - over 2.5bn people interact with ByteDance products including TikTok.

Creation is the core of ByteDance's purpose. Our products are built to help imaginations thrive. This is doubly true of the teams that make our innovations possible.

Together, we inspire creativity and enrich life - a mission we aim towards achieving every day. At ByteDance, we create together and grow together. That's how we drive impact - for ourselves, our company, and the users we serve. We are committed to building a safe, healthy and positive online environment for all our users.

We have over 110,000 employees based in more than 30 countries globally. Join us.

Industry
IT & Software
Company Size
10,000+ employees
Headquarters
China, CN
Year Founded
Unknown
Social Media