Claas A. Voelcker

PostDoc at the University of Texas at Austin, RL researcher focused on too many things, he/him, 🏳️‍🌈 🤖 🧙

prof_pic.jpg

GDC 4.306

2317 SPEEDWAY

AUSTIN, TX 78712

I am a PostDoc focused on Reinforcement and Machine Learning at the University of Texas at Austin, where I work with Peter Stone and Amy Zhang. Previously I received a PhD from the University of Toronto and the Vector Institute, where I was fortunate to be advised by Profs. Amir-massoud Farahmand and Igor Gilitschenski.

Originally from Germany, I received a Bachelor and Master degree from the University of Darmstadt with Honors. There, I had the great pleasure to be supervised and mentored by Profs. Kristian Kersting and Jan Peters.

I am proud to serve as a core organizer for Queer in AI, where I help promote the interests of queer researchers and practitioners at AI /ML conferences and in the wider community.

Research Vision

To make good decisions, intelligent agents need to evaluate the consequences and quality of their actions. In Reinforcement Learning, this quality is captured by the value function. My driving research question is how we enable autonomous AI to learn good value functions and to accurately estimate the impact of their actions. To achieve this, I have worked on a variety of techniques that make value learning fast, efficient, and accurate.

Works that train and leverage value functions

  • In Update-free Steering we show how value functions can be used to improve pre-trained robotics policies at execution time.
  • In REPPO we present an algorithm that is able to leverage strong action value function learning for lightning fast on-policy improvements in hard robotics tasks.
  • In MAD-TD we look at how simulated data from a learned world model can improve an agent’s value estimation.
  • In When does self-prediction help we take a look at different auxiliary tasks and explain how they help stabilize value learning.
  • In Dissecting Deep RL we investigate architectural regularizations that prevent agents from overestimating the value of their actions,

news

Mar 12, 2026 New papers! We pre-published “Update-Free On-Policy Steering via Verifiers”, a method that uses on-policy value functions to steer pre-trained robotics policies in real! We are also very grateful that “Relative ENtropy Pathwise Policy Optimization” was accepted to ICLR 2026. See you in Rio!
Feb 12, 2026 I gave a talk on REPPO at the BeNeRL Seminar. You can find my slides here.
Nov 04, 2025 I finally started my postdoc position at the University of Texas at Austin! So excited for the coming years filled with RL and robotics discoveries.
Oct 02, 2025 Our new paper Relative Entropy Pathwise Policy Optimization has a blog post that goes through everything you need to know to about implementing it yourself and understanding the technical bits and pieces.
Jul 01, 2025 Our paper Calibrated Value-Aware Model Learning with Probabilistic Environment Models will be presented at ICML 2025 in Vancouver next week! Let me know if you want to meet up for a coffee.

latest posts

selected publications

2026

  1. reppo_overview.png
    Relative Entropy Pathwise Policy Optimization
    Claas A. Voelcker, Axel Brunnbauer, Marcel Hussing, Michal Nauman, Pieter Abbeel, and 3 more authors
    International Conference on Learning Representations, Apr 2026
  2. update_free.png
    Update-Free On-Policy Steering via Verifiers
    Maria Attarian, Ian Vyse, Claas Voelcker, Jasper Gerigk, Evgenii Opryshko, and 4 more authors
    Mar 2026

2025

  1. paper_mad.png
    MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
    Claas A. Voelcker, Marcel Hussing, Eric Eaton, Amir-massoud Farahmand, and Igor Gilitschenski
    International Conference on Learning Representations, Apr 2025
  2. paper_cvaml.png
    Calibrated Value-Aware Model Learning with Probabilistic Environment Models
    Claas A Voelcker, Anastasiia Pedan, Arash Ahmadian, Romina Abachi, Igor Gilitschenski, and 1 more author
    International Conference on Machine Learning, Jul 2025

2024

  1. paper_dissecting.png
    Dissecting Deep RL with High Update Ratios: Combatting Value Overestimation and Divergence
    Marcel Hussing, Claas A. Voelcker, Igor Gilitschenski, Amir-massoud Farahmand, and Eric Eaton
    Reinforcement Learning Conference, Aug 2024
  2. paper_understanding.png
    When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement Learning
    Claas A. Voelcker, Tyler Kastner, Igor Gilitschenski, and Amir-massoud Farahmand
    Reinforcement Learning Conference, Aug 2024

2023

  1. paper_lambda.png
    λ-AC: Learning latent decision-aware models for reinforcement learning in continuous state-spaces
    Claas A. Voelcker, Arash Ahmadian, Romina Abachi, Igor Gilitschenski, and Amir-massoud Farahmand
    arXiv preprint arXiv:2306.17366, Nov 2023

2022

  1. paper_vagram.png
    Value Gradient weighted Model-Based Reinforcement Learning
    Claas A. Voelcker, Victor Liao, Animesh Garg, and Amir-massoud Farahmand
    In International Conference on Learning Representations, Apr 2022