Kaggle’s talk included an announcement of an exciting new million-dollar prize for AI software engineering agents.
Kaggle CEO D. Sculley kicked off the session with a primer on empirical rigour, covering what happens when assumptions about data are violated in practice, and how to deal with concerns around leakage and contamination.
After reviewing pros and cons of static benchmarks and community leaderboards for comparing methods, he explained some ways in which Kaggle have mitigated these issues in their competitions.
One extremely effective mitigation: requiring researchers (or competition participants) to submit their models before the test data is generated. In the CAFA 5 Protein Function Prediction competition, for example, this meant gathering predictions before measurements for the test set proteins were made in the lab.
Sculley was joined on stage first by Carlos Jimenez and John Yang of SWE-bench (a benchmark for automated code-generation systems measuring their ability to resolve GitHub issues), and then by Databricks cofounder Andy Konwinski.
After Jimenez and Yang explained SWE-bench, Konwinski posted a tweet announcing the million-dollar challenge live on-stage.
For more details on the prize, go to kprize.ai or the Kaggle competition page