Job Description
About the Team
Join ByteDance's KV caching and storage systems team, where we build and own mission-critical distributed KV caching and storage products powering ByteDance's global infrastructure. Our portfolio includes Redis-compatible services, next-generation shared-storage engines, and performance/cost optimization components, along with a full ecosystem of operational automation, observability, data movement, and recovery capabilities. We serve ByteDance's core business scenarios at massive scale — recommendation, search, ads, e-commerce, messaging, live streaming, and collaboration suites — with strict requirements on availability, latency, throughput, global deployment, and cost efficiency.
Responsibilities
- Design and develop core KV caching and storage systems, including distributed caching systems and Redis-compatible KV storage systems, with a focus on low latency, high throughput, and high availability.
- Build planet-scale reliability, leading or contributing to HA architecture, failure isolation, multi-AZ/multi-region disaster recovery, and large-scale stability engineering for always-on business workloads.
- Drive compute/storage efficiency improvements (CPU, memory, IO, network), including cache hierarchy designs (memory/SSD), read/write amplification reductions, and capacity planning for billion-level request traffic.
- Build a production-grade ecosystem, including automated orchestration operations (provisioning, scaling, placement, scheduling) and monitoring systems (tracing, profiling, incident response runbooks).
- Implement and evolve capabilities such as Bulkload, backup & restore, point-in-time recovery, tiered storage, and integration with upstream/downstream data systems to enrich data ecosystems.
- Research new hardware and new technologies, evaluate and land improvements using ZNS SSD, io_uring, RDMA/CXL, and "AI+DB" directions in production.
The base salary range for this position in the selected city is $202160 - $368220 annually.