ByteDance

Tech Lead, Machine Learning Engineer - Global E-Commerce (Conversational AI)

ByteDance  •  Singapore, SG (Onsite)  •  3 days ago
Apply
AI can make mistakes so check important info. Chat history is never stored.

Job Description

About the team
We are building the next generation of conversational AI for Global E-commerce — a unified Agent system that learns from every interaction, runs in 30+ languages, and is deployed across one of the largest e-commerce surfaces on the internet. Our 2026 north star is a self-evolving Agent: post-training, harness, tools, memory, and evaluation form one closed loop, and every served conversation becomes training, evaluation, and retrieval signal for the next iteration.

Business surface — buyer, seller, dispute & appeals, operations — is the substrate. Our work is foundational LLM + Agent engineering: post-training, agent harness, tool design, memory, evaluation, inference, multilinguality. We are hiring people who want to push the SOTA of these systems in production, at scale, with hundreds of millions of users in the loop.

What we work on
- LLM post-training & alignment — large-scale SFT, DPO/IPO/KTO, online RL (RLHF / RLAIF / RLVR), reward modeling, preference data curation, long-context training, distillation, QAT. We train and adapt frontier-class open-weights models (≥7B → ≥70B) and our own continually-pretrained checkpoints on internal infra (FSDP / DeepSpeed / Megatron-style stacks).
- Agent foundations — harness design (context engineering, sub-agents, durable execution, parallel tool use), tool design (ACI principles, namespaced surfaces, poka-yoke, instrumented traces), memory (episodic + semantic + skill-shaped), MCP and Skill-style extensibility. We treat tools and prompts as APIs and iterate against production traces.
- Auto-eval and observability — LLM-as-judge with calibrated human agreement, real-traffic replay, failure-mode taxonomies, regression + safety + cost + latency harnesses. We have moved root-cause analysis on a single case from ~13 engineer-days to ~3 minutes auto.
- Self-evolving systems — every served conversation becomes a candidate for training data, eval set membership, retrieval index, and skill induction, with privacy and quality gates. The flywheel is the product.
- Inference & serving — vLLM / TensorRT-LLM, MoE, speculative decoding, KV-cache reuse and prompt caching, multi-tenant low-latency serving. Cost per resolved conversation is a first-class metric.
- Multilinguality & locale grounding — 30+ languages, low-resource adaptation, faithful translation, locale-aware reasoning, cross-cultural tone.
- Reasoning & long-context modeling — chain-of-thought / planning post-training, reasoning-trace supervision, long-context training and serving, retrieval-augmented reasoning, self-consistency and verifier models.

Responsibilities
- Set technical direction. Own a multi-quarter roadmap across one or more of: post-training, agent harness, evaluation, self-evolving data flywheel, serving. Translate north-star metrics into a sequence of 2-3 high-ROI bets per quarter and ship them.
- Compound the team. Hire and develop 1-3 strong ICs. Design their work surfaces for growth, not just dispatch. Raise the median technical bar through design review, code review, and 1:1 framing.
- Stay in the loop with the model. Tech Lead is not a manager role. You still write the load-bearing PRs, propose the core abstractions, and write the design docs that decide the team's ceiling for the next 2-3 quarters.
- Drive cross-team alignment. Partner with foundation-model, infra, product, and adjacent algorithm teams; own sign-off on cross-cutting technical decisions.
- Observability and rollback. Build the per-turn tracing, tool-call analytics, and failure-mode taxonomies that let the team diagnose any regression within hours, not days.
ByteDance

About ByteDance

ByteDance is a global incubator of platforms at the cutting edge of commerce, content, entertainment and enterprise services - over 2.5bn people interact with ByteDance products including TikTok.

Creation is the core of ByteDance's purpose. Our products are built to help imaginations thrive. This is doubly true of the teams that make our innovations possible.

Together, we inspire creativity and enrich life - a mission we aim towards achieving every day. At ByteDance, we create together and grow together. That's how we drive impact - for ourselves, our company, and the users we serve. We are committed to building a safe, healthy and positive online environment for all our users.

We have over 110,000 employees based in more than 30 countries globally. Join us.

Industry
IT & Software
Company Size
10,000+ employees
Headquarters
China, CN
Year Founded
Unknown
Social Media