CODALAB
The Codalab project aims to meet the needs of companies and research laboratories and scale up to organize scientific and technical challenges integrating big data from industrial sectors such as health, energy, ecology, resource management, security, and other economic and societal aspects. The CodaLab competitions platform and its new version Codabench are powerful open source frameworks for running scientific competitions and benchmarks, which involve result or code submission. You can either participate in an existing competition (or benchmark) or host one.
With Anne-Catherine Letournel, Adrien Pavao, and Ihsan Ullah, we administer the public instance of Codalab competitions at Université Paris-Saclay, and animate the underlying open-source project. More information ...
Adrien and Ihsan just started their own consulting company to organize challenges!
The community is rapidly growing and the public instance hosted at LISN, Université Paris-Saclay, averages 50 new competitions per month.
Why organizing challenges?
In the rapidly evolving field of computing, machine learning (Data-Driven Artificial Intelligence) stands out as an influential field, transforming many aspects of our lives, improving efficiency and productivity, opening new opportunities for innovation and complex problem solving, and contributes to improving the quality of life of individuals globally. Recent applications include: improving decision-making, particularly useful in fields like finance and medicine; helping to combat climate change by optimizing the use of resources; personalizing the customer experience in e-commerce, banking, and other industries; improving security and preventing fraud; and improving accessibility for people with disabilities, for example, through voice recognition systems, visual aids for the visually impaired, and other assistive technologies.
Despite solid mathematical foundations, these methods often require empirical evaluation to confirm their effectiveness and reliability. This trend is intensifying with the increasing complexity of methods, particularly with the emergence of deep neural networks, which are difficult to explain and interpret. Empirical evaluation becomes essential, particularly due to the complexity of the algorithms and the unpredictable nature of the data. Experimental benchmarks are therefore crucial for comparing models and understanding their behavior. Artificial intelligence is considered an experimental science, involving the study of the behavior of computer programs.
The approach that was taken as part of this project is that of organizing scientific competitions (also called “challenges” or challenges). Scientific competitions systematize large-scale experiments and show the effectiveness of participants in solving complex problems. Annual competitions, organized on the Codalab platform, address various scientific or industrial questions, evaluating the automatic algorithms submitted by participants.
The importance of impartial evaluations of algorithms is constantly increasing with the acceleration of progress in Artificial Intelligence. According to David Donoho:
“The emergence of Frictionless Reproducibility flows from 3 data science principles that matured together after decades of work by many technologists and numerous research communities. The mature principles involve data sharing, code sharing, and competitive challenges, however implemented in the particularly strong form of frictionless open services.”
He cites the Codalab project as being exemplary in this area [11].
The origin of the Codalab project and its evolution
Codalab [1] was established in 2013 as a joint venture between Microsoft and Stanford University. The original vision was to create an ecosystem for conducting computational research in a more efficient, reproducible, and collaborative way, by combining worksheets and competitions. Worksheets capture complex research pipelines in a reproducible way and create “executable documents.”
In 2014, ChaLearn (a non-profit association created by Isabelle Guyon) joined forces to co-develop the “Codalab competitions” platform. Since 2015, the University of Paris-Saclay has been the “community lead” of “Codalab competitions”, under the leadership of Isabelle Guyon, professor of Artificial Intelligence. Codalab is administered by the staff of the Interdisciplinary Laboratory of Digital Sciences (LISN) at the University of Paris-Saclay, under the direction of Anne-Catherine Letournel. The number of competitions organized by Codalab is growing regularly (Figure 1).
Figure 1: Evolution of the number of competitions organized on CodaLab. Competitions each year, between 2013 and 2023. Only serious competitions (significant prize, high number of participants and/or accepted to a conference) are taken into account.
Data science journalist Harald Carlens of ML Contests provides a detailed analysis of competitions in 2023 and cites Codalab (Figure 2) as the most used platform ahead of Kaggle (American platform owned by Google) and Tianchi (Chinese Alibaba cloud platform) [12] . Codalab competitions is therefore the international jewel of artificial intelligence challenge platforms.
Since 2019 we have been developing, in part with the support of the ANR, the Ile-de-France Region, and ChaLearn, a new version of “CodaLab competitions” [1] called Codabench [2], to organize both competitions and benchmarks. Codabench is fully backwards compatible with Codalab, but its code has been redesigned and better structured. It provides more flexibility and security to users. Its beta version was put into service in 2021 to implement targeted competitions and benchmarks. Since August 2023, version 1 has been available to all users.
Figure 2: Comparison between platforms by Harald Carlens [12]. Note: the number of Codalab competitions is very underestimated by Carlens because it only counts competitions which offer grand prizes and the vocation of Codalab is mainly academic and university challenges.
Main characteristics and differentiating factors of Codalab
Most of the competitions hosted on Codalab are machine learning (data science) competitions, but they are not limited to this application area. Codalab can also accommodate any problem for which a solution can be provided in the form of a zip archive, containing a certain number of files to be quantitatively evaluated by an evaluation program (provided by the organizers). The assessment program should return a numerical score, which is displayed on a leaderboard where participants' performances are compared.
Table 1: Comparison of platform characteristics [14].
What made Codalab successful is, on the one hand, the large number of desirable features offered (Table 1), security, and the flexibility given to organizers. Here are the main features of Codalab:
Competition Bundle: A competition bundle is a ZIP file containing all the elements necessary to create a competition (data, documentation, scoring program and configuration settings). Organizers can download and update these bundles, making the process of creating competitions both simple and flexible.
Submission of Results and Code: Codalab allows participants to submit either results (predictions) or code. Code submission is particularly beneficial because it allows for more rigorous and reproducible evaluation of methods, ensuring execution in a controlled environment on the server. This also helps prevent cheating and ensures that test data remains hidden.
Compute Workers: Codalab offers computer workers to process submissions. Organizers can also create custom queues and attach their own compute resources, whether physical hardware or virtual machines. This allows for great flexibility and ensures that the host institution is not overburdened by computational costs.
Docker environment: Secure participant code execution and scoring is performed in Docker containers, which isolates applications and their dependencies, ensuring uniform execution across different environments and increased reproducibility.
Phases: Competitions may be structured into multiple phases, each with their own parameters, including separate dates, data and scoring programs. This allows flexible management and clear segmentation of the different stages of the competition.
Multiple Scores: The dashboard is highly customizable and can handle multiple scoring functions simultaneously. Participants can be ranked based on the average rank they achieve across all subscores.
Documentation: Codalab's documentation is well organized and categorized according to the different actors involved: participants, organizers, administrators and contributors.
Compared with its main competitors, Codalab offers much appreciated flexibility in the field of research and teaching. Kaggle, for example, does not allow you to run the learning of models on the platform, nor to code your own evaluation functions or configure the leaderboard.
Outreach and diversity of organized competitions
A statistical analysis of past challenges hosted on Codalab carried out by Aleksandra Kruchinina [14] (Figure 3) showed the variety of areas covered and the growing impact of the platform in the scientific community [5]. The largest number of competitions hosted by Codalab are in the areas of computer vision and language processing.
This analysis also demonstrated little correlation between the prizes offered and the number of participants, indicating the lesser role of monetary motivations in the academic challenges that are offered on Codalab. The motivations are more the prospects of recognition and publication. According to Google Scholar, more than 4000 publications refer to Codalab. The platform itself was the subject of a publication [1], and a book on challenges. Chapter 2 was the subject of a tutorial on Codabench, at the NeurIPS conference [SLIDES].
Codalab now has thousands of competitions and more than 50,000 users (see highlights). Most competitions attract between 10 and 100 participants. However, since the implementation of the new server in 2018, more than 200 competitions have attracted more than 100 participants. The challenge with the largest number of participants was the LiTS - Liver Tumor Segmentation Challenge without prizes but it was organized in conjunction with ISBI 2017 and MICCAI, 2017. It attracted more than 5000 participants. The number of submissions per competition also varies a lot: a few hundred on average, but sometimes several tens of thousands. As for the prize, the maximum amount was $40,000. The median duration of competitions is 73 days. And code submission is only required for 16% of competitions.
Among the competitions that attracted the greatest number of participants we note:
Large-scale Video Object Segmentation Challenge: 4000+ YouTube videos.
These competitions which attract hundreds of participants are recurring (occur every year, most often in conjunction with conferences).
We ourselves organize several challenges per year, such as the AutoDL and meta-learning, L2RPN, LAP, AI4ED, and auto-survey challenge series. These challenges are organized in conjunction with prestigious conferences in Machine Learning (NeurIPS, IJCNN, ICML), Computer Vision (ICCV, CVPR) and Artificial Intelligence (AAAI).
Codalab also has a growing role in education. Every year, students of the Master of Artificial Intelligence at Paris-Saclay University organize challenges, which are then solved by other students. Some of these student challenges serve as a basis for organizing international research challenges, such as the “Fair Universe” challenge, sponsored by CERN, to classify particles in the presence of systematic errors in simulators, which was the subject of of a hackathon in Paris. Other teachers use Codalab for courses with hundreds of students (Artificial Neural Networks and Deep Learning, Image Analysis and Computer Vision Course IACV, USC coursework, Skoltech transformer course).
Figure 3: Distribution of machine learning domains from competitions hosted on Codalab.
Challenge protocols
The Codalab competition platform and its new version Codabench are powerful open source portals for running scientific competitions and benchmarks, which allow the submission of results or code (Figure 4). Users can either participate in an existing competition or organize one.
The different types of machine learning (AI) challenge protocols reflect the diversity and complexity of the tasks that AI can accomplish. Here is a summary of the main types of challenge protocols discussed in Chapter 7 of Adrien Pavao's thesis [5], funded by this project:
Supervised Learning: In these challenges, models are trained with labeled data, where each input instance has an associated correct output (label). Participants develop models to make accurate predictions on unseen data, using datasets divided into training and testing sets.
Machine Learning (AutoML): AutoML aims to automate the process of building AI models. AutoML competitions evaluate algorithms on a set of tasks, each requiring training the model from scratch. The scores obtained on these tasks are then used to rank the algorithms.
Meta-learning: Meta-learning is about learning to learn. Meta-learning algorithms are evaluated on their ability to adapt to different tasks. They are typically trained in a meta-learning phase before being tested on a new set of tasks.
Time Series Analysis: These challenges involve tasks like anomaly detection or sequence prediction. They can focus on time series regression (predicting a continuous variable) or time series forecasting (predicting future values of the series).
Reinforcement Learning (RL): In RL challenges, an agent learns to make decisions by interacting with an environment. RL competitions may require environmental simulations and evaluate agents based on their ability to maximize cumulative rewards.
Use of Confidential Data: These challenges involve sensitive data. Two approaches are possible: replace real data with synthetic data or run participants' models blindly on real data.
Adversarial Challenges: These challenges involve adversarial processes, where one group of participants creates an artifact (such as synthetic data) and another group develops methods to challenge these artifacts. They can be designed sequentially (the phases are separated) or simultaneously (the phases run in parallel).
Each of these protocols has its own particularities and challenges, reflecting the richness and variety of AI applications. They require different skills from participants and offer varied perspectives on how AI models can be developed, evaluated and improved. The Codalab documentation and that of Codabench offer a large number of challenge patterns. Adrien Pavao has made introductory articles available to the public [6, 7] allowing beginners to organize their own challenge.
Figure 4: Workflow of a Codalab competition. Codalab supports two types of challenges: with code or results submission. In both cases, the predictions made by the models are compared with the ground truth. This must remain confidential (hidden from participants).
Conclusion
The implementation of the Codalab platform at Paris-Saclay University has enabled the effective organization of machine learning competitions, affecting various fields such as health, energy and ecology. It currently has tens of thousands of users and hosts hundreds of challenges per year. The challenges organized within the “AI for industry” framework with Dassault Aviation and RTE, allocated one million euros by the Île-de-France region, not only stimulated innovation but also had a significant industrial impact. The platform has benefited teaching at Paris-Saclay University, contributing to the academic influence of the Île-de-France region.
AI is proving to be a powerful tool for solving complex problems in various industrial sectors, confirming the importance of continued investments in this area. Codalab has proven to be a versatile and robust platform for organizing data science competitions, demonstrating its potential for future research and innovation.
Acknowledgements
Codalab competitions is hosted by Laboratoire Interdisciplinaires des Sciences du Numérique (LISN) at Universite Paris-Saclay, and administered by the Codalab Governance, with support from ChaLearn, Région Ile-de-France and ANR Chair of Artificial Intelligence HUMANIA ANR-19-CHIA-0022. We thank the numerous contributors of the Codalab open source project.
References
[1] CodaLab Competitions: An open source platform to organize scientific challenges
Adrien Pavao, Isabelle Guyon, Anne-Catherine Letournel, Xavier Baró, Hugo Jair Escalante, Sergio Escalera, Tyler Thomas, Zhen Xu, https://hal.science/LISN-AO/hal-03629462v1, 2022.
[2] Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking Platform
Zhen Xu, Sergio Escalera, Isabelle Guyon, Adrien Pavao, Magali Richard, Wei-Wei Tu,, Quanming Yao, Huan Zhao, https://hal.science/hal-03374222v3, 2022.
[3] Aircraft Numerical "Twin": A Time Series Regression Competition
Adrien Pavao, Isabelle Guyon, Nachar Stéphane, Fabrice Lebeau, Martin Ghienne, Ludovic Platon, Tristan Barbagelata, Pierre Escamilla, Sana Mzali, Meng Liao, Sylvain Lassonde, Antonin Braun, Slim Ben Amor, Liliana Cucu-Grosjean, Marwan Wehaiba, Avner Bar-Hen, Adriana Gogonel, Alaeddine Ben Cheikh, Marc Duda, Julien Laugel, Mathieu Marauri, Mhamed Souissi, Théo Lecerf, Mehdi Elion, Sonia Tabti, Julien Budynek, Pauline Le Bouteiller, Antonin Penon, Raphaël-David Lasseri, Julien Ripoche, Thomas Epalle, https://hal.inria.fr/hal-03463307/. 2021.
[4] Reinforcement learning for Energies of the future and carbon neutrality: a Challenge Design, Gaëtan Serré, Eva Boguslawski, Benjamin Donnot, Adrien Pavão, Isabelle Guyon, Antoine Marot. https://hal.science/hal-03726294v2, 2022.
[5] Méthodologie pour la conception et l'analyse de compétitions en apprentissage automatique, Thèse de doctorat, Adrien Pavao, Décembre 2023. https://www.theses.fr/s258689
[6] Designing a Data Science Competition is an Excellent Way to Learn. Adrien Pavao. https://towardsdatascience.com/designing-a-data-science-competition-is-an-excellent-way-to-learn-69e6fd582702
[7] How to Create your First Benchmark on Codabench. Adrien Pavao. https://adrienpavao.medium.com/how-to-create-your-first-benchmark-on-codabench-910e2aee130c
[8] The Tracking Machine Learning challenge : Throughput phase
Sabrina Amrouche, Laurent Basara, Paolo Calafiura, Dmitry Emeliyanov, Victor Estrade, Steven Farrell, Cécile Germain, Vladimir Vava Gligorov, Tobias Golling, Sergey Gorbunov, Heather Gray, Isabelle Guyon, Mikhail Hushchyn, Vincenzo Innocente, Moritz Kiehn, Marcel Kunze, Edward Moyse, David Rousseau, Andreas Salzburger, Andrey Ustyuzhanin, Jean-Roch Vlimant. 2021.
[9] Judging Competitions and Benchmarks: A Candidate Election Approach
Adrien Pavao, Michael Vaccaro, Isabelle Guyon,
https://www.esann.org/sites/default/files/proceedings/2021/ES2021-122.pdf, 2021.
[10] “AI competitions and benchmarks: the science behind the contests”, Chapter 6: Academic competitions
Hugo Jair Escalante, Aleksandra Kruchinina
https://arxiv.org/abs/2312.00268, 2023.
[11] Data Science at the Singularity, David Donoho, Harvard Data Science Review, Jan 2024
https://hdsr.mitpress.mit.edu/pub/g9mau4m0/release/1
[12] Carlens, H, “State of Competitive Machine Learning in 2023”, ML Contests Research, 2024.
https://mlcontests.com/state-of-competitive-machine-learning-2023/
[13] Scientific impact and main outcomes of CodaLab competitions, Aleksandra Kruchinina, Rapport de stage, July 2022.
[14] Machine learning scientific competitions and datasets. David Rousseau and Andrey Ustyuzhanin.. 2020. URL https://arxiv.org/abs/2012.08520