ICML 2025 Workshop Highlights: Foundation Models for Structured Data and AI for Math

ICML 2025 · Read the rest of our (virtual) coverage of this year's conference

ICML 2025 wrapped up with 33 workshops spread across two days. Workshops allow researchers to share newer work in a less formal environment than the main conference and each workshop focuses on a specific domain or area of research.

Based on anticipated attendance numbers in the conference app, the three most popular workshops across the two days were Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges and Futures, Exploration in AI Today (EXAIT), and Foundation Models for Structured Data (FMSD).

Below are a few brief highlights from two of the workshops.

Foundation Models for Structured Data

Workshop on Foundation Models for Structured Data

This was the first ICML workshop on Foundation Models for Structured Data. It covered a broad range of topics related to pre-trained models for tabular and time-series data.

There was a generally-shared view that foundation models for structured data are still in their infancy, with many promising directions for further work.

Andrew Gordon Wilson’s talk (“A Universal Approach to Model Construction”) included some advice on model selection (embrace a highly expressive hypothesis space in combination with a compression bias). He questioned the view that deep learning is ‘special’ compared to other machine learning approaches, and suggested that the success of overparameterisation observed in phenomena like double descent is not unique to deep learning.

For more on this view, see his ICML 2025 position paper Position: Deep Learning is Not So Mysterious or Different.

A screenshot of the first page of the paper "Position: Deep Learning is Not So Mysterious or Different" — Andrew Gordon Wilson's position paper

Josh Gardner’s talk (“Toward the GPT-3 Moment for Tabular Data Models”) reviewed the progress made in the first three GPT models, and attributed their success to three main factors (large-scale data, reliable benchmarks, and scalability) before going on to evaluate the state of these factors for tabular foundation models.

The talk noted that there’s no equivalent to CommonCrawl for tabular data (yet), and that much of the large-scale tabular data is synthetic (for example, TabPFN is entirely trained on synthetic data). Currently most benchmarks focus on “single-table” prediction, and there is a need for more tabular benchmarks aimed at foundation modelling or few-shot/in-context learning.

He also highlighted some misconceptions, coining the phrase “The Token Fallacy,” referring to the common belief that “models that tokenise numbers cannot effectively represent them”, as well as reminding researchers of the importance of building with exponentially improving compute in mind.

At the end of the workshop, the organisers gave out three best paper awards:

Best applications paper: Towards Generalizable Multimodal ECG Representation Learning with LLM-extracted Clinical Entities by Mingsheng Cai, Jiuming Jiang, Wenhao Huang, Che Liu, and Rossella Arcucci.
Best tabular paper: ConTextTab: A Semantics-Aware Tabular In-Context Learner by Marco Spinaci, Marek Polewczyk, Maximilian Schambach, and Sam Thelin.
Best timeseries paper: CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data only by Shifeng Xie, Vasilii Feofanov, Marius Alonso, Ambroise Odonnat, Jianfeng Zhang, and Ievgen Redko.

AI for Math

AI for Math Workshop

This was the second year of the AI for Math workshop at ICML (summary of the previous ICML AI for math workshop), alongside a similar series of workshops at NeurIPS (NeurIPS 2024 Math-AI workshop coverage).

One recurring theme throughout this workshop was the high-level choice of research direction: does the community want to build systems for fully autonomous mathematical research, or tools to support human reasoning and decision-making?

Some recent work discussed in the workshop included Goedel-prover-v2, a new state-of-the-art open-weights model for proving theorems in Lean, APE-Bench I, a new proof engineering benchmark, and CSLib, a new open-source Lean 4 library for foundational results in computer science, as well as an update on the AI Mathematical Olympiad.

There were two competition tracks in this workshop:

Track 1, proof engineering (APE-Bench I), was won by Sparsh Tewadia, using Gemini 2.5.
Track 2, reasoning from physics diagrams (SeePhys), was won by Ruitao Wu, Hao Liang, Bohan Zeng, Junbo Niu, Wentao Zhang, and Bin Dong, using a combination of Gemini 2.5 and OpenAI o3.

There were two best paper awards:

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? by Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, and Gao Huang.
Token Hidden Reward: Steering Exploration-Exploitation in GRPO Training by Wenlong Deng, Yi Ren, Danica J. Sutherland, Christos Thrampoulidis, and Xiaoxiao Li.