The ICLR test of time awards go to ICLR papers from ten years ago that have had a lasting impact on the field.
Adam
The 2025 ICLR test of time award winners were Diederik (Durk) Kingma and Jimmy Ba, for their 2015 ICLR paper Adam: A Method for Stochastic Optimization (arXiv). Both authors shared the presentation, with Kingma on stage and Ba on Zoom.
Durk Kingma accepting the test of time award from ICLR program chair Fei Sha
Building on Adagrad and RMSProp, the Adam optimisation algorithm updates parameter-specific learning rates, leading to much faster training than with vanilla stochastic gradient descent, and it remains the de-facto optimiser for deep learning ten years on (alongside the AdamW variant).
Based on our analysis of papers in OpenReview, the Adam optimiser or its AdamW variant are mentioned in over half of this year’s ICLR papers, and in almost 90% of ICLR 2025 papers that mention optimisers — with vanilla stochastic gradient descent making up most of the remaining 10%.
As Ba stated during the talk: “desperation drives innovation”.
Kingma and Ba met in London during an internship at Google DeepMind in 2014, and the Adam paper started its life as both an overdue course requirement for Ba, and a desire from Kingma’s for better optimisers to train the variational auto-encoders he had developed.
Despite its ubiquity today, Adam’s path to success was not straightforward.
Durk Kingma
Initially rejected from the main ICLR conference track, the authors sent a “fiery rebuttal email” to the ICLR organisers explaining that the reasons given for rejection had already been addressed in revisions made to their paper before the deadline. Eventually, it was accepted as a poster (but not granted an oral presentation).
Many enhancements to Adam have been proposed in the years since, and the authors highlighted two variants: AdamW (Loschilov and Hutter, ICLR 2019), which adds weight decay, and Adam-mini (Zhang et al, ICLR 2025), which reduces memory usage. The latter has a (poster at this year’s ICLR. Despite this, the standard version of Adam remains in widespread use.
Kingma ended the talk with some comments on the field of AI as a whole, expressing both hopes and concerns.
Neural Machine Translation
The runner-up of the test of time award was the paper Neural Machine Translation by Jointly Learning to Align and Translate (arXiv by Dzmitry Bahdanau, Kyunghyun Cho and Yoshua Bengio. The presentation was given by Bahdanau.
Another hugely impactful paper, this work is widely credited with introducing and popularising the attention mechanism at the core of the Transformer architecture that underpins so many of today’s state-of-the-art AI models.
Co-authors Yoshua Bengio and Dzmitry Bahdanau accepting their certificates, accompanied by the ICLR 2025 program chairs and general chair. Not pictured: Kyunghyun Cho.
Bahdanau highlighted contemporaneous work along the same lines including Memory Networks (Weston et al, also at ICLR 2015) and Neural Turing Machines (Graves et al, 2014), all of which went against the grain at a time when recurrent neural networks (RNNs) were the go-to architecture for sequence processing.
While developing the paper, Bahdanau had heard rumours of a big neural-net based translation project within Google, with a much larger compute budget than Bahdanau’s lab. This Google project turned out to be the sequence-to-sequence paper which later won the NeurIPS 2024 test of time award.
Once again, desperation drove innovation, and Bahdanau looked for ways to model long-term dependencies (i.e., reliably translate long sentences) that could perform well using only the 4 GPUs in his lab.
After a few failed attempts, one key idea made a big difference: letting the model “search” for required info across a sequence. Shortly before publication, the search terminology was replaced by the phrase “a mechanism of attention”.
This attention mechanism was adopted in the 2017 paper Attention is All You Need, which introduced the Transformer architecture, and Bahdanau credited this paper with four great ideas:
- Do attention at all layers, not just the top
- Attend to the previous layer for all positions in parallel
- Use many attention heads
- Ditch the RNN
Bahdanau also ended his talk with a discussion on the field of AI as a whole. At this point, as if on cue, a loud thunderstorm above the venue punctuated proceedings with ominous rumbles.
Risks, Concerns, and Mitigations
Both Kingma and Bahdanau ended their talks by expressing concerns about the field of AI, and the impact it could have if sufficient mitigations are not taken.
While acknowledging other categories of risks, they both focused on the potentially destabilising political and economic effects of widely-deployed powerful AI systems.
Kingma called for mitigations in the form of technological countermeasures, sensible AI regulations, and a strengthening of social support systems.
Dzmitry Bahdanau
Bahdanau highlighted the importance of private, local, and cheap-to-run AI systems, and called for researchers to treat amortised local inference cost as a key consideration when developing models.
What’s the canonical way to pronounce ICLR?
Samuele Marro from the University of Oxford has been taking a data-driven approach.
Samuele Marro's data gathering project
So far, the results are largely in favour of “eye-clear” over “I-C-L-R”.
There were three winners of the outstanding paper award this year, and three runner-up papers.
For the second time ever, ICLR is also awarding a test of time award for papers from ICLR 2015 which have had sustained impact. More on these tomorrow.
Listed below are the winners and runners-up of the outstanding paper awards.
Winners
Safety Alignment Should be Made More Than Just a Few Tokens Deep
Oral Presentation PaperTL;DR: We identify an underlying problem (shallow safety alignment) that makes current safety alignment vulnerable, and we also propose approaches for mitigations.
Authors: Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu, Xiao Ma, Subhrajit Roy, Ahmad Beirami, Prateek Mittal, Peter Henderson.
Learning Dynamics of LLM Finetuning
Oral Presentation PaperTL;DR: The paper propose a novel learning dynamics framework to understand LLM’s behavior during finetuning (e.g., SFT, DPO, and other variants). Some counter-intuitive behavior can be well explained by the proposed framework.
Authors: Yi Ren, Danica J. Sutherland.
AlphaEdit: Null-Space Constrained Model Editing for Language Models
Oral Presentation PaperTL;DR: We propose a novel model editing method named AlphaEdit to minimize the disruption to the preserved knowledge during editing.
Authors: Junfeng Fang, Houcheng Jiang, Kun Wang, Yunshan Ma, Jie Shi, Xiang Wang, Xiangnan He, Tat-Seng Chua.
Honourable mentions
Data Shapley in One Training Run.
Oral Presentation PaperTL;DR: We develop a new notion of Data Shapley that requires only one model training run.
Authors: Jiachen T. Wang, Prateek Mittal, Dawn Song, Ruoxi Jia.
SAM 2: Segment Anything in Images and Videos.
Oral Presentation PaperTL;DR: We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos.
Authors: Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollar, Christoph Feichtenhofer.
Faster Cascades via Speculative Decoding.
Oral Presentation PaperTL;DR: Faster language model cascades through the use of speculative execution.
Authors: Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Seungyeon Kim, Neha Gupta, Aditya Krishna Menon, Sanjiv Kumar.
Zico Kolter: Building Safe and Robust AI Systems
Invited TalkIn this year’s first invited talk, Zico Kolter started with a look back to the work presented at ICLR 2015 ten years ago. Out of a total of just 31 main-conference papers that year, several had a major impact on the field from today’s perspective — including the Adam optimiser and Neural Machine Translation papers, which won the Test of Time Award and will be discussed in more depth later this week.
Zico Kolter
Kolter presented years of his lab’s work through four eras: optimisation, certified adversarial robustness, empirics of deep learning, and AI Safety, and highlighted two recent pieces of work in the AI Safety category: antidistillation sampling (generating text from a model in a way that makes distillation harder but outputs are still generally useful) and safety pretraining (methods for incorporating safety guardrails early on in the model training process, not just in post-training).
Kolter ended with a call to action suggesting that AI Safety should be a key area of focus for academic research today, and emphasising his expectation that work in this area will have a significant impact on the future development of the field.
Song-Chun Zhu: Framework, Prototype, Definition and Benchmark
Invited TalkSong-Chun Zhu’s talk started from a philosophical vantage point, with a reflection on how “AGI” might be defined, and how any such definition hinges on the definition of what it means to be human.
Song-Chun Zhu presenting in Hall 1
Zhu then explored the space of cognitive agents through his three-dimensional framework which considers up of the agent’s cognitive architecture (how the agent works), its potential functions (what it can do), and its value function (what it wants to do).
He also summarised some of his lab’s research, including the development of TongTong, an agent trained in a simulated physical environment, as well as the Tong Test benchmark aimed at evaluating AGI.
Staying Dry
This year’s ICLR takes place at Singapore Expo, in halls 1-4.
The whole week is due to be warm, humid, and rainy at times, so it’s helpful to have a route to the conference venue that avoids outdoor walking where possible.
For those taking the MRT, the Expo MRT station’s Exit A connects directly to a covered walkway that leads into Singapore Expo.
MRT exit A, leading to the Expo
Changi City Point Mall
The Expo MRT station is also connected to Changi City Point mall: Exit F connects to the basement level of the mall.
MRT Exit F and the basement level of Changi City Point mall
There are some useful amenities here: the electronics store Challenger is right by the MRT exit, and there’s a pharmacy — Watsons — a bit further along.
The mall can be a good route to the conference venue.
For those at the Dorsett Changi City hotel, the best route is likely to go into the mall, down to B1, and into the MRT through Exit F. Then through the walkway, and out through Exit A to the Expo.
The underground walkway connecting MRT exit A with exits D & F
For those staying at the Park Avenue Changi hotel or just looking to cross Changi South Ave 1 when it’s rainy, it looks like the best route is to take Exit D into the MRT, and come out through Exit A.
MRT Exit D
MRT exits A, D, and F all have elevators as well as escalators, and can be accessed without needing to pass through ticket gates.
Map of the Expo MRT station exits
Taxis
Uber doesn’t operate in Singapore. Alternatives are the traditional city taxis, or ride-hailing apps Gojek or Grab.
There’s a taxi stand by Apex Gallery (near Hall 1), and a second taxi stand a little further away near Hall 6.
This year’s ICLR starts on Thursday. There’s a packed conference schedule alongside plenty of social events during the 5 days of the conference.
Early registration is today 2-7pm. On Thursday registration is open from 7:30am, and on the other conference days registration will start at 8am. For those who have pre-ordered lunch, there’s a separate line to pick up lunch vouchers.
The conference venue is right by the Expo MRT station, a 15-minute ride from Singapore’s Changi airport and half an hour from downtown Singapore.
It’s likely to rain this week. For tips on staying dry on the way to the venue, see this entry.