Background and description
Modulai is offering a master's thesis opportunity focused on applying Reinforcement Learning (RL) to improve the capabilities of large language models (LLMs). Reinforcement learning was first pivotal in aligning LLMs with human preferences, but recent work shows its role now extends much further, RL has become the dominant paradigm for eliciting reasoning, enabling models to acquire advanced problem-solving strategies and adapt to complex, multi-step tasks.
Recent advancements highlight the transformative role of RL in LLM post-training:
- DeepSeek-R1 demonstrated that reasoning ability can be induced through large-scale RL with verifiable rewards, including a pure-RL variant (R1-Zero) trained with no supervised fine-tuning at all, popularizing RL as the central tool for building reasoning models.
- DeepSeekMath explored how reinforcement learning can enable models to handle multi-step mathematical reasoning, and introduced the RL method now widely used across the field, Group Relative Policy Optimization (GRPO).
- Tulu 3 introduced a family of fully-open post-trained models, leveraging Supervised Fine-tuning (SFT), Direct Preference Optimization (DPO), and a technique dubbed Reinforcement Learning with Verifiable Rewards (RLVR).
- DAPO released a fully open-source, large-scale RL system that refines GRPO with techniques such as Clip-Higher and dynamic sampling to stabilize long chain-of-thought training, surpassing R1-Zero-level results with substantially fewer training steps.
- ReTool introduced reinforcement learning for tool use, showing how LLMs can learn to combine text-based reasoning and code interpreters for complex tasks.
This project aims to investigate RL approaches for improving LLMs in specialized domains (such as reasoning and tool use). You will explore open-weight models, implement and compare RL methods inspired by the latest research, and evaluate how reinforcement learning impacts model capabilities. Through this work, you will contribute to the growing understanding of how RL can shape the next generation of LLMs.
ML techniques and tools
Open-weight LLMs
Reinforcement learning for LLMs
Python, PyTorch, Git, Hugging Face
References
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (DeepSeek-AI, 2025): https://arxiv.org/abs/2501.12948arXiv
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Shao et al., 2024): https://arxiv.org/abs/2402.03300arXiv
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training (Lambert et al., Allen Institute for AI, 2024): https://arxiv.org/abs/2411.15124Hugging Face
- DAPO: An Open-Source LLM Reinforcement Learning System at Scale (Yu et al., ByteDance Seed & Tsinghua University, 2025): https://arxiv.org/abs/2503.14476arXiv
- ReTool: Reinforcement Learning for Strategic Tool Use in LLMs (Feng et al., ByteDance Seed, 2025): https://arxiv.org/abs/2504.11536arXiv
Background and description
We also offer a master's thesis project in the emerging field of Vision-Language-Action (VLA) models for robotics. VLA models unify computer vision, natural language processing, and robotic control into end-to-end systems, enabling robots to understand visual scenes, interpret human instructions, and execute tasks without manual programming.
Recent research (e.g., Liang et al., 2024) shows that VLA models can perform complex tasks such as “pick up the red mug from the cluttered table.” This thesis invites students to explore and advance these models, contributing to one of the most actively researched directions in AI-powered robotics.
The project scope will be flexible and tailored to the student’s interests and research findings. Students will work with state-of-the-art robotic hardware, GPU clusters, and receive guidance from experts in AI and robotics.
ML Techniques and Tools
Python, PyTorch, Git, Hugging Face
Vision-language-action models (multi-modal AI)
Computer vision and natural language processing methods
Real-time control systems and robotic integration
References
OpenVLA: An Open-Source Vision-Language-Action Model (Kim et al., 2024): https://arxiv.org/abs/2406.09246
π0: A Vision-Language-Action Flow Model for General Robot Control (Physical Intelligence, 2024): https://arxiv.org/abs/2410.24164
Gemini Robotics: Bringing AI into the Physical World (Google DeepMind, 2025): https://arxiv.org/abs/2503.20020
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots (NVIDIA, 2025): https://arxiv.org/abs/2503.14734
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics (Hugging Face, 2025): https://arxiv.org/abs/2506.01844
Applied Machine Learning projects encompass a wide range of domains, including healthcare, finance, natural language processing, computer vision, and more. This open application invites students to choose projects aligned with their interests and career goals. Do you have an idea - let us know what it's about by describing it.
Finishing a master's in machine learning or a master's in another field but with courses in machine learning and programming added
Link to relevant GitHub account if available.
Grades for bachelor's and master's.
Updated CV or an updated LinkedIn profile.
*Suitable candidates will be called to one interview before making a final decision.
The last date for application will be the June 10, but if suitable candidates apply, the process will end beforehand.
Modulai’s clients range from startups to multinational companies. They all share that machine learning is central to how they operate, compete, and create value.
Our services range from advisory projects and feasibility studies to end-to-end development and refinement of machine learning systems and products.
We use state-of-the-art techniques, always focusing on maximizing business impact, delivering solutions in areas such as credit risk, fraud detection, dynamic pricing, recommendation systems, computer vision, natural language processing, opportunity spotting, logistics optimization, up-sell, cross-sales, smart building optimization, predictive maintenance, and route planning.
When doing a master thesis project at Modulai, you are invited to all team activities such as daily stand-ups, weekly learning breakfasts, monthly AWs, and other team activities. We look forward to having you as part of our team!

Modulai is an opinionated, no-bullshit AI partner. We turn AI into real business impact, no fluff, just results.
Our mission is simple: solve real business problems with hands on machine learning and AI.
We work across diverse projects with global enterprises and early-stage startups.
Our team is deeply collaborative, and we believe in learning through doing, sharing knowledge and constantly pushing the boundaries of what ML can achieve.