Robot Learning Workshop

ICLR 2025 · Read our full coverage of this year's conference

This year, for the first time at ICLR after several previous iterations at NeurIPS, there was a Robot Learning workshop. Based on the ICLR conference app, over 700 people added the workshop to their agenda for today (only Agentic AI for Science and Sparsity in LLMs were more popular among Sunday’s 20 workshops), and the organisers estimated a peak attendance of around 300 in the room on the day.

Researchers gather around posters stuck to the wall of the room. Some are engaged in conversation. — Robot Learning Workshop poster session

The best paper award for this workshop went to Instant Policy: In-Context Imitation Learning via Graph Diffusion by Vitalis Vosylius and Edward Johns. Vosylius’ talk showed the effectiveness of their approach for generalisation, robustness, learning efficiency. It also showed the ability to replicate demonstrations done using different embodiments, such as a robot arm copying the behaviour of a human arm.

The runner-up award went to Max Sobol Mark et al.’s Policy-Agnostic RL paper. There were many other interesting papers at this workshop, and the poster sessions were packed. Several of the poster presenters also had the opportunity to give a 5-minute oral presentation of their work.

Chelsea Finn addressing the audience from behind a lectern, smiling and holding her hands up in the air. — Invited speaker Chelsea Finn presenting 'Data-Driven Pre-Training and Post-Training for Robot Foundation Models'

The workshop featured seven invited talks on topics including home robots, superhuman-level quadcopter drone control, robot foundation models, and 1v1 humanoid robot soccer, as well as a panel discussion.

There were several recurring themes throughout the talks and panels. Notably, some areas of agreement among many of the presenters:

Practical, useful, general multitask robotics remains a very hard problem. Impressive-looking cherry-picked demo videos can give people outside the field an inaccurate and overly optimistic view of progress.
Better evaluation standards are needed. Unlike in other areas of ML where performance on benchmark datasets can be measured for new methods with relative ease, the heterogeneity of robotic embodiments, sensor stacks, abstraction levels, task specifications and world representations complicate evaluations and comparability.
Learning efficiency remains a hurdle to progress. There is no “internet of robot data” that can be pre-trained on, and real-world data collection is expensive. Using the right abstraction appears to be crucial: in both the quadcopter and robot soccer talks, the learning process became vastly more efficient when operating over higher-level abstractions (e.g. depth-map images or ground-truth state data) as opposed to raw camera pixel inputs.
There are multiple promising approaches to robot learning — including imitation learning and reinforcement learning — and it’s likely that successful future systems will use a combination of these methods.

Sandy Huang, Chaoyi Li, Niresh Dravin, Animesh Garg, and Edward Johns sit on chairs on a stage, facing the audience. Dravin is speaking into a microphone while passing another microphone to Li. Most of the panelists are smiling. — Panel discussion: Sandy Huang, Chaoyi Li, Niresh Dravin, Animesh Garg, and Edward Johns

There were also areas where opinions among speakers differed, or where there was more uncertainty expressed:

What is the ‘best’ representation for robot learning in the real-world? Some of the presented work used point clouds, some used bounding boxes or key points on image data, and some used latent representations learned from raw sensor input. This is in contrast to the progress in language modelling, where all current mainstream approaches learn latent representations over sequences of tokens.
What is the relative importance of research in simulation versus research using real-world embodied robots? Simulators have advanced significantly in recent years, but there remains a ‘sim-to-real’ gap, and testing robots in the real world is the only reliable way to see how big that gap is. On the other hand, real-world research is more expensive and labour-intensive, and there are many problems still to be solved in simulation environments.
Why humanoid robots? On the one hand, humanoids are harder to work with than wheeled robots (as one speaker said, “once you introduce legs into the picture, everything becomes much more complicated”). On the other hand, the humanoid form factor could be valuable as a common platform for standardised robotics research.

A child-size humanoid robot sits in a reclining chair behind a steering wheel and pedals. The robot does not have hands with fingers and cannot actually manipulate the steering wheel, but the photo is staged to make it look as if it can. — Sponsor collab: a Booster Robotics humanoid in the FrodoBots driving seat

This was a stimulating workshop with many interesting talks and posters. For more information on the work presented here, and on past and future iterations of this workshop, see the robot learning workshop website.