Job Description
About the Team
We are the Infrastructure System Lab — a hybrid research and engineering group building the next-generation AI-native data infrastructure. Our work sits at the intersection of databases, large-scale systems, and AI. We drive innovation across:
- Next-generation databases: We build VectorDBs and multi-modal AI-native databases designed to support large-scale retrieval and reasoning workloads.
- AI for Infra: We leverage machine learning to build intelligent algorithms for infrastructure optimization, tuning, and observability.
- LLM Copilot: We develop LLM-based tooling like NL2SQL, NL2Chart.
- High-performance cache systems: We develop a multi-engine key-value store optimized for distributed storage workloads. We're also building KV caches for LLM inference at scale.
This is a highly collaborative team where researchers and engineers work side-by-side to bring innovations from paper to production. We publish, prototype, and build robust systems deployed across key products used by millions.
About the Role
We are seeking a highly motivated and technically strong Research Scientist with a PhD in Computer Science, Database, Information Retrieval, or a related field to join our team. You will work on designing and optimizing state-of-the-art vector indexing algorithms to power large-scale similarity search, filtered search, and hybrid retrieval use cases.
Your work will directly contribute to the next-generation vector database infrastructure that supports real-time and offline retrieval across billions or even trillions of high-dimensional vectors.
Why Join Us
- Work on problems at the frontier of AI x systems with huge practical impact.
- Collaborate with a world-class team of researchers and engineers.
- Opportunity to publish, attend conferences, and contribute to open-source.
- Competitive compensation, generous research support, and a culture of innovation.
Responsibilities
- Research and develop new algorithms for approximate nearest neighbor (ANN) search, especially for filtered, hybrid, or disk-based scenarios.
- Optimize existing algorithms for scalability, low latency, memory footprint, and hybrid search support.
- Collaborate with engineering teams to prototype, benchmark, and productionize indexing solutions.
- Contribute to academic publications, open-source libraries, or internal technical documentation.
- Stay current with research trends in vector search, retrieval systems, retrieval-augmented generation (RAG), large language models (LLMs), and related areas.
The base salary range for this position in the selected city is $136800 - $359720 annually.