ConferencesICLR

ICLR 2025

Contact: Harald Carlens on Whova

Test of Time Award: The Adam Optimiser and Attention
Presentations

The ICLR test of time awards go to ICLR papers from ten years ago that have had a lasting impact on the field.

Adam

The 2025 ICLR test of time award winners were Diederik (Durk) Kingma and Jimmy Ba, for their 2015 ICLR paper Adam: A Method for Stochastic Optimization (arXiv). Both authors shared the presentation, with Kingma on stage and Ba on Zoom.

Durk Kingma, smiling and holding a certificate titled "test of time" while standing next to ICLR Program Chair Fei Sha

Durk Kingma accepting the test of time award from ICLR program chair Fei Sha

Building on Adagrad and RMSProp, the Adam optimisation algorithm updates parameter-specific learning rates, leading to much faster training than with vanilla stochastic gradient descent, and it remains the de-facto optimiser for deep learning ten years on (alongside the AdamW variant).

Based on our analysis of papers in OpenReview, the Adam optimiser or its AdamW variant are mentioned in over half of this year’s ICLR papers, and in almost 90% of ICLR 2025 papers that mention optimisers — with vanilla stochastic gradient descent making up most of the remaining 10%.

As Ba stated during the talk: “desperation drives innovation”.

Kingma and Ba met in London during an internship at Google DeepMind in 2014, and the Adam paper started its life as both an overdue course requirement for Ba, and a desire from Kingma’s for better optimisers to train the variational auto-encoders he had developed.

Despite its ubiquity today, Adam’s path to success was not straightforward.

Durk Kingma, standing behind a lectern, smiling at the audience. Behind him, on screen, is an image of the Adam paper with "Rejected" stamped on it in large red letters.

Durk Kingma

Initially rejected from the main ICLR conference track, the authors sent a “fiery rebuttal email” to the ICLR organisers explaining that the reasons given for rejection had already been addressed in revisions made to their paper before the deadline. Eventually, it was accepted as a poster (but not granted an oral presentation).

Many enhancements to Adam have been proposed in the years since, and the authors highlighted two variants: AdamW (Loschilov and Hutter, ICLR 2019), which adds weight decay, and Adam-mini (Zhang et al, ICLR 2025), which reduces memory usage. The latter has a (poster at this year’s ICLR. Despite this, the standard version of Adam remains in widespread use.

Kingma ended the talk with some comments on the field of AI as a whole, expressing both hopes and concerns.

Neural Machine Translation

The runner-up of the test of time award was the paper Neural Machine Translation by Jointly Learning to Align and Translate (arXiv by Dzmitry Bahdanau, Kyunghyun Cho and Yoshua Bengio. The presentation was given by Bahdanau.

Another hugely impactful paper, this work is widely credited with introducing and popularising the attention mechanism at the core of the Transformer architecture that underpins so many of today’s state-of-the-art AI models.

Seven people standing on a stage facing the camera. Two of the people in the middle are holding up certificates titled "test of time runner up"

Co-authors Yoshua Bengio and Dzmitry Bahdanau accepting their certificates, accompanied by the ICLR 2025 program chairs and general chair. Not pictured: Kyunghyun Cho.

Bahdanau highlighted contemporaneous work along the same lines including Memory Networks (Weston et al, also at ICLR 2015) and Neural Turing Machines (Graves et al, 2014), all of which went against the grain at a time when recurrent neural networks (RNNs) were the go-to architecture for sequence processing.

While developing the paper, Bahdanau had heard rumours of a big neural-net based translation project within Google, with a much larger compute budget than Bahdanau’s lab. This Google project turned out to be the sequence-to-sequence paper which later won the NeurIPS 2024 test of time award.

Once again, desperation drove innovation, and Bahdanau looked for ways to model long-term dependencies (i.e., reliably translate long sentences) that could perform well using only the 4 GPUs in his lab.

After a few failed attempts, one key idea made a big difference: letting the model “search” for required info across a sequence. Shortly before publication, the search terminology was replaced by the phrase “a mechanism of attention”.

This attention mechanism was adopted in the 2017 paper Attention is All You Need, which introduced the Transformer architecture, and Bahdanau credited this paper with four great ideas:

  1. Do attention at all layers, not just the top
  2. Attend to the previous layer for all positions in parallel
  3. Use many attention heads
  4. Ditch the RNN

Bahdanau also ended his talk with a discussion on the field of AI as a whole. At this point, as if on cue, a loud thunderstorm above the venue punctuated proceedings with ominous rumbles.

Risks, Concerns, and Mitigations

Both Kingma and Bahdanau ended their talks by expressing concerns about the field of AI, and the impact it could have if sufficient mitigations are not taken.

While acknowledging other categories of risks, they both focused on the potentially destabilising political and economic effects of widely-deployed powerful AI systems.

Kingma called for mitigations in the form of technological countermeasures, sensible AI regulations, and a strengthening of social support systems.

"Dzmitry Bahdanau standing behind a lectern, addressing the audience"

Dzmitry Bahdanau

Bahdanau highlighted the importance of private, local, and cheap-to-run AI systems, and called for researchers to treat amortised local inference cost as a key consideration when developing models.

How to pronounce ICLR

What’s the canonical way to pronounce ICLR?

Samuele Marro from the University of Oxford has been taking a data-driven approach.

Samuele Marro holding a large poster with hand-written tally marks in columns titled "eye-clear" and "I-C-L-R". A pen is attached at the bottom.

Samuele Marro's data gathering project

So far, the results are largely in favour of “eye-clear” over “I-C-L-R”.

Outstanding Papers

There were three winners of the outstanding paper award this year, and three runner-up papers.

For the second time ever, ICLR is also awarding a test of time award for papers from ICLR 2015 which have had sustained impact. More on these tomorrow.

Listed below are the winners and runners-up of the outstanding paper awards.

Winners

Safety Alignment Should be Made More Than Just a Few Tokens Deep

Oral Presentation Paper

TL;DR: We identify an underlying problem (shallow safety alignment) that makes current safety alignment vulnerable, and we also propose approaches for mitigations.

Authors: Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu, Xiao Ma, Subhrajit Roy, Ahmad Beirami, Prateek Mittal, Peter Henderson.

Learning Dynamics of LLM Finetuning

Oral Presentation Paper

TL;DR: The paper propose a novel learning dynamics framework to understand LLM’s behavior during finetuning (e.g., SFT, DPO, and other variants). Some counter-intuitive behavior can be well explained by the proposed framework.

Authors: Yi Ren, Danica J. Sutherland.

AlphaEdit: Null-Space Constrained Model Editing for Language Models

Oral Presentation Paper

TL;DR: We propose a novel model editing method named AlphaEdit to minimize the disruption to the preserved knowledge during editing.

Authors: Junfeng Fang, Houcheng Jiang, Kun Wang, Yunshan Ma, Jie Shi, Xiang Wang, Xiangnan He, Tat-Seng Chua.

Honourable mentions

Data Shapley in One Training Run.

Oral Presentation Paper

TL;DR: We develop a new notion of Data Shapley that requires only one model training run.

Authors: Jiachen T. Wang, Prateek Mittal, Dawn Song, Ruoxi Jia.

SAM 2: Segment Anything in Images and Videos.

Oral Presentation Paper

TL;DR: We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos.

Authors: Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollar, Christoph Feichtenhofer.

Faster Cascades via Speculative Decoding.

Oral Presentation Paper

TL;DR: Faster language model cascades through the use of speculative execution.

Authors: Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Seungyeon Kim, Neha Gupta, Aditya Krishna Menon, Sanjiv Kumar.

Invited Talks: Zico Kolter & Song-Chun Zhu

Zico Kolter: Building Safe and Robust AI Systems

Invited Talk

In this year’s first invited talk, Zico Kolter started with a look back to the work presented at ICLR 2015 ten years ago. Out of a total of just 31 main-conference papers that year, several had a major impact on the field from today’s perspective — including the Adam optimiser and Neural Machine Translation papers, which won the Test of Time Award and will be discussed in more depth later this week.

Zico Kolter holding a microphone and addressing the audience.

Zico Kolter

Kolter presented years of his lab’s work through four eras: optimisation, certified adversarial robustness, empirics of deep learning, and AI Safety, and highlighted two recent pieces of work in the AI Safety category: antidistillation sampling (generating text from a model in a way that makes distillation harder but outputs are still generally useful) and safety pretraining (methods for incorporating safety guardrails early on in the model training process, not just in post-training).

Kolter ended with a call to action suggesting that AI Safety should be a key area of focus for academic research today, and emphasising his expectation that work in this area will have a significant impact on the future development of the field.

Song-Chun Zhu: Framework, Prototype, Definition and Benchmark

Invited Talk

Song-Chun Zhu’s talk started from a philosophical vantage point, with a reflection on how “AGI” might be defined, and how any such definition hinges on the definition of what it means to be human.

A large, packed hall with an audience watching Song-Chun Zhu's talk. At the front, multiple large screens show presentation slides and enlarged images of the speaker.

Song-Chun Zhu presenting in Hall 1

Zhu then explored the space of cognitive agents through his three-dimensional framework which considers up of the agent’s cognitive architecture (how the agent works), its potential functions (what it can do), and its value function (what it wants to do).

He also summarised some of his lab’s research, including the development of TongTong, an agent trained in a simulated physical environment, as well as the Tong Test benchmark aimed at evaluating AGI.

Singapore Expo: getting there, and getting around

Staying Dry

This year’s ICLR takes place at Singapore Expo, in halls 1-4.

The whole week is due to be warm, humid, and rainy at times, so it’s helpful to have a route to the conference venue that avoids outdoor walking where possible.

For those taking the MRT, the Expo MRT station’s Exit A connects directly to a covered walkway that leads into Singapore Expo.

An underground passage. Up escalators are visible on the left. A sign points left, indicating 'A: Expo Halls 1-6'.

MRT exit A, leading to the Expo

Changi City Point Mall

The Expo MRT station is also connected to Changi City Point mall: Exit F connects to the basement level of the mall.

An open passage with shops visible beyond. A sign says 'F: Changi City Point'. An escalator is visible on the left.

MRT Exit F and the basement level of Changi City Point mall

There are some useful amenities here: the electronics store Challenger is right by the MRT exit, and there’s a pharmacy — Watsons — a bit further along.

The mall can be a good route to the conference venue.

For those at the Dorsett Changi City hotel, the best route is likely to go into the mall, down to B1, and into the MRT through Exit F. Then through the walkway, and out through Exit A to the Expo.

A wide underground walkway. A sign indicates directions to Singapore Expo and other locations.

The underground walkway connecting MRT exit A with exits D & F

For those staying at the Park Avenue Changi hotel or just looking to cross Changi South Ave 1 when it’s rainy, it looks like the best route is to take Exit D into the MRT, and come out through Exit A.

An above-ground open structure containing an elevator and two escalators.

MRT Exit D

MRT exits A, D, and F all have elevators as well as escalators, and can be accessed without needing to pass through ticket gates.

A map showing how Changi City Point mall, the MRT station, and Singapore Expo are connected.

Map of the Expo MRT station exits

Taxis

Uber doesn’t operate in Singapore. Alternatives are the traditional city taxis, or ride-hailing apps Gojek or Grab.

There’s a taxi stand by Apex Gallery (near Hall 1), and a second taxi stand a little further away near Hall 6.

Registration

This year’s ICLR starts on Thursday. There’s a packed conference schedule alongside plenty of social events during the 5 days of the conference.

A group of people line up behind a desk that says "registration". In the foreground, a sign reads
'ICLR
The Thirteenth International Conference on Learning Representations
April 24-28, 2025
Singapore
Registration
Luggage Room
First Aid'

Early registration is today 2-7pm. On Thursday registration is open from 7:30am, and on the other conference days registration will start at 8am. For those who have pre-ordered lunch, there’s a separate line to pick up lunch vouchers.

An outdoor covered walkway with people walking through it. A sign reads 'Welcome to Singapore Expo'. A banner welcomes people to ICLR.

The conference venue is right by the Expo MRT station, a 15-minute ride from Singapore’s Changi airport and half an hour from downtown Singapore.

It’s likely to rain this week. For tips on staying dry on the way to the venue, see this entry.

ICLR 2025 socials, happy hours, and dinners

A list of all the social events happening around ICLR 2025 in Singapore. Updated regularly.

Official ICLR socials taking place at the conference venue are marked with an asterisk. Most others require registration and will probably fill up quickly!

Tuesday 22nd

Wednesday 23rd

Thursday 24th

Friday 25th

Saturday 26th

Sunday 27th

Something missing? Message me on the Whova conference app - search “Harald Carlens” under Attendees.