Kaggle’s talk included an announcement of an exciting new million-dollar prize for AI software engineering agents.
Kaggle CEO D. Sculley kicked off the session with a primer on empirical rigour, covering what happens when assumptions about data are violated in practice, and how to deal with concerns around leakage and contamination.
D. Sculley discussing empirical rigour
After reviewing pros and cons of static benchmarks and community leaderboards for comparing methods, he explained some ways in which Kaggle have mitigated these issues in their competitions.
One extremely effective mitigation: requiring researchers (or competition participants) to submit their models before the test data is generated. In the CAFA 5 Protein Function Prediction competition, for example, this meant gathering predictions before measurements for the test set proteins were made in the lab.
Sculley was joined on stage first by Carlos Jimenez and John Yang of SWE-bench (a benchmark for automated code-generation systems measuring their ability to resolve GitHub issues), and then by Databricks cofounder Andy Konwinski.
D. Sculley, Carlos Jimenez, John Yang, and Andy Konwinski.
After Jimenez and Yang explained SWE-bench, Konwinski posted a tweet announcing the million-dollar challenge live on-stage.
I’ll give $1M to the first open source AI that gets 90% on this sweet new contamination-free version of SWE-bench - kprize.ai.
Andy Konwinski’s live-tweet during the session
For more details on the prize, go to kprize.ai or the Kaggle competition page