HUMANIA

Artificial Intelligence for All

This project is an ANR funded 4-year chair for the democratization of AI, starting in September 2020.

Synopsis:

With the current rapid growth of AI research and applications, there are both unprecedented opportunities and legitimate worries about its potential mis-uses. In this context, we are committed to help making AI easier to access and use by a large population segment. Making AI more accessible to all should both be an important factor of economical growth and help strengthen democracy.

This research aims at reducing the need for human expertise in the implementation of pattern recognition and modeling algorithms, including Deep Learning, in various fields of application (medicine, engineering, social sciences, physics), using multiple modalities (images, videos, text, time series, questionnaires). To that end, we will organize scientific competitions (or challenges) in Automated Machine Learning (AutoML). Our designs will expose the community to progressively harder and more diverse settings, ever reducing the need for human intervention in the modeling process. By involving the scientific community at large in challenge-solving, we will effectively multiply by an important factor our government funding to solve such hard AutoML problems. All winners' code will be open-sourced. This effort will culminate in an AutoRL challenge (Automated Reinforcement Learning) in which participants will have to submit code that will be blind tested on new RL tasks they have never seen before.

Recognizing that there is no good data-driven AI without good data, we also want to dedicate part of our time to educate the public on proper data collection and preparation. Our objective is to instill good practices to reduce problems resulting from bias in data or irreproducible results due to lack of sufficient data. We will also encourage the protection of data confidentiality or privacy by supplying software allowing data donors to replace real data by realistic synthetic data. This will facilitate broadening access to data confidential or private data having a commercial value or the potential to harm individuals.

Below are two video presentations given at the DATAIA kickoff day (all videos) [SLIDES] and the D2C DATAIA club connection [SLIDES].

Challenges and benchmarks:

We are involved in organizing challenges and benchmarks to stimulate the community of machine learning to make progress in Automated Machine Learning and other areas. We are community lead of the open source project Codalab competitions and maintain the main public instance of Codalab competition platform and the Codabench benchmark platform:

Competition programs and recent competitions:

Competitions in preparation:

  • MetaDL:

    • Cross-domain meta-learning

    • Meta-learning from learning curves

  • Privacy:

    • Membership inference attacks

  • L2RPN:

    • Energies of the futur and carbon neutrality

Conferences

We are involved in the organization of several conferences:

Doctoral theses:

Methodology for Design and Analysis of Competitive Machine Learning Experiments

Adrien Pavao

Started in May 2020. Anticipated completion February 2023.

[half-way report][scholar][dblp]


We propose to develop and study a systematic and unified methodology to organize and use scientific challenge in teaching and research, particularly in the domain of machine learning (data-driven Artificial Intelligence). As of today, challenges are becoming more and more popular as a pedagogic tool and as a means of pushing the state-of-the-art by engaging scientists of all ages, within or outside academia. This can be thought of as a form of citizen science. There is the promise that this form of community contribution to science might contribute to reproducible research and democratize Artificial Intelligence. Yet, there is a danger that, if the data employed (or the metrics used to define tasks) are plagued with bias or artifacts, the outcome of such effort might be at best useless and at worst harmful or damaging to the reputation of citizen science. Our endeavor will be to put a formal framework around challenge organization and provide the community with helpful guidelines. In conjunction with the tools of challenge organization that we are developing in the context of the Codalab project, this promises to be a useful contribution to the community. This thesis will include theoretical fundamental contributions drawing on experimental design and game theory, and practical empirical results resulting from the analysis of A/B testing of alternate challenge protocols in actual competition organizations.

Deep Modular Learning

Haozhe Sun

Started February 2021. Anticipated completion January 2024.

OmniPrint: [NeurIPS 2021 poster][github][paper]


The current trend in Artificial Intelligence (AI) is to heavily rely on systems capable of learning from examples, such as Deep learning (DL) models, a modern embodiment of artificial neural networks. While numerous applications have made it to market in recent years (including self-driving cars, automated assistants, booking services, and chatbots, improvements in search engines, recommendations, and advertising, and heath-care applications, to name a few) DL models are still notoriously hard to deploy in new applications. In particular, the require massive numbers of training examples, hours of GPU training, and highly qualified engineers to hand-tune their architectures. This thesis will contribute to reduce the barrier of entry in using DL models for new applications, a step towards "democratizing AI".

The angle taken will be to develop new Transfer Learning (TL) approaches, based on modular DL architectures. Transfer learning encompasses all techniques to speed up learning by capitalizing on exposure to previous similar tasks. For instance, using pre-trained networks is a key TL tactic used by winners of the recent AutoDL challenge. The doctoral candidate will push forward the notion of reusability of pre-trained networks in whole or in part (modularity).

There are several important questions raised in this context.

From a technical standpoint, the current limitations of pre-training include that: (T1) In many domains, there are no available pre-trained networks, due to lack of massive datasets in related domains; (T2) Novel architectures of networks such as "Graph Neural Networks" (GNNs) do not easily lend themselves to pre-training; (T3) Besides merely retraining the last layer and fine-tuning inner layers, means of re-using pre-trained networks in new contexts are under-developed. These three issues offer challenging research opportunities to efficiently use prior knowledge, data simulators, and/or data augmentation, and develop novel algorithms and architectures that learn in a modular re-usable way.

From the fundamental research point of view, modularity and inheritance of pre-trained learning modules in biologically-inspired learning systems is a burning topic in AI. Unanswered questions include: (F1) Does modularity of the brain increase its effectiveness or is this a legacy of evolution that plays no particular role; (F2) Likewise, in which context and how does modularity help in artificial systems (e.g. to implement invariances, to help transfer learning, etc.); (F3) Does module specialization hinder or help generalization to new data modalities (e.g. new sensor data), and if so, how?

In this context, the doctoral student will investigate a novel approach to transfer learning that we call "Deep Modular Learning". The candidate will tackle the problem of training large artificial neural networks whose architectures are modular and whose modules are eventually reusable. A possible method to approach the problem will be to use multi-level optimization algorithms, addressing the optimization of the overall system (achieving a higher level objective) under the constraint that the modules achieve a lower level objective (reusability). One scientific aim will be to challenge the hypothesis that modularity is essential for learning systems, in that it accelerates learning by making possible an effective form of transfer learning, a central functionality in AI.

Several principles/conjectures/hypotheses may be guiding this research including: (P1) The principle of parsimony or "Ockham's razor" embodied in modern learning theory as "regularization", which in layman's words states that ``of two theories equivalently powerful in reproducing observations, one should prefer the simpler one''; indeed modular architectures sharing identical sub-modules have fewer adjustable parameters and therefore can be considered less complex than e.g. fully connected networks. (P2) The innateness hypothesis: Task solving capabilities are a combination of innate vs. acquired skills. Is it a characteristic of intelligent systems to rely more on learned skills such as language rather than inherit them? Is it true that language can be completely learned "from scratch"? (P3) Induction, deduction, conceptualisation, and causality: do intelligent learning systems rest upon modularity for conceptualisation, language acquisition, and causal inference?

To put this framework in practice, the student will choose practical applications from domains including biomedicine (e.g. molecular toxicity or efficacy), ecology, econometrics, speech recognition, natural language processing, image or video processing, etc.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).

Behley, J., Garbade, M., Milioto, A., Quenzel, J., Behnke, S., Stachniss, C., & Gall, J. (2019). SemanticKITTI: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE International Conference on Computer Vision (pp. 9297-9307).

Zhou, L., Gao, J., Li, D., & Shum, H. Y. (2020). The design and implementation of xiaoice, an empathetic social chatbot. Computational Linguistics, 46(1), 53-93.

Covington, P., Adams, J., & Sargin, E. (2016, September). Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems (pp. 191-198).

Rajkomar, A., Oren, E., Chen, K., Dai, A. M., Hajaj, N., Hardt, M., ... & Sundberg, P. (2018). Scalable and accurate deep learning with electronic health records. NPJ Digital Medicine, 1(1), 18.

He, K., Girshick, R., & Dollár, P. (2019). Rethinking imagenet pre-training. In Proceedings of the IEEE international conference on computer vision (pp. 4918-4927).

Ellefsen, K. O., Mouret, J. B., & Clune, J. (2015). Neural modularity helps organisms evolve to learn new skills without forgetting old skills. PLoS Comput Biol, 11(4), e1004128.

Rasmussen, C. E., & Ghahramani, Z. (2001). Occam's razor. In Advances in neural information processing systems (pp. 294-300).

Sinha, A., Malo, P., & Deb, K. (2017). A review on bilevel optimization: from classical to evolutionary approaches and applications. IEEE Transactions on Evolutionary Computation, 22(2), 276-295.

Data-centric Automated Deep Learning

Romain Egele

[DeepHyper][scholar][dblp]

Despite recent successes of Deep Learning in applications [1], designing good (deep) neural networks remains difficult and requires practical experience and expertise, and most importantly a lot of data [2–4]. In the past few years, a great deal of effort has been going into selecting architectures and hyper-parameters, and training a complex deep networks, going through many cycles of trial-and-errors and involving lots of heuristics [5]. The acceleration of the demand for deep learning solutions naturally gives rise to the need for improving the automation of the design of deep learning solutions [6–15]. Naturally, brute-forcing problems by feeding neural networks with even larger datasets is a tempting solution, encouraged by successes obtained by large companies such as Google, whose director of AI research Peter Norvig famously said:“We don’t have better algorithms than anyone else; we just have more data”. However, in the wake of scandals revealing ill-advised decisions made by learning machines, the same Peter Norvig revised his opinion and said: “More data beats clever algorithms, but better data beats more data.” Thus, it is becoming ever more obvious that a paradigm shift in machine learning is needed, away from focusing on model-centric automating machine learning (AutoML), e.g. neural architecture search and hyper-parameter optimisation, and addressing the problem of obtaining better data. However, collecting, curating, cleaning, de-biasing, and preprocessing data in all sorts of ways (aka “data wrangling”), may become the human-intensive de facto bottleneck of AutoML. This thesis ambitions to address the problem of making better use (in an automated way) of already collected data to gain significant qualitative and quantitative benefits, an approach recently referred to as “data-centric machine learning” by Andrew Ng. More specifically, the thesis will focus on data-centric deep learning because (deep) neural networks are notoriously data hungry.

Several avenues of research will be explored, from the algorithmic, theoretical, and practical points of view:

• New algorithm development. This aspect of the work will include (1) defining metrics for properly transforming/selecting/generating “good” data from an existing dataset and metrics for evaluating the “good” effect of such operations; (2) devising search policies to optimize this process, eventually using Bayesian optimization [16], reinforcement learning [17], genetic algorithms or a combination; and progressively moving to more complex cases (3) data fusion by incorporating multiple datasets from different domains or multiple datasets of the same domain but exhibiting covariate shifts (a data-centric version of meta-learning). This work on algorithms will build on top of prior experience of the doctoral candidate acquired during internships at Argonne National Labs (USA) and Université Paris Saclay during which he designed and implemented automated deep learning solutions and contributed novel algorithms to the DeepHyper1 package [18–20].

• Theoretical advances. This aspect of the work will include (1) decomposing sources of errors/noise/bias in data, extending and/or unifying current concepts [21] of random and epistemic error due to lack of training data (either uniformly missing or depleted in specific regions of data space), aleatoric error due to intrinsic limitations of data (e.g. a bad data representation or preprocessing; poorly imputed missing values), and systematic error (due to bias in data stemming from poor data collection), with purpose of grounding in theory methods for filtering, re-balancing, or correcting data; (2) improving generalization error bounds by exploiting the information gained by data augmentation or other data-centric algorithms (e.g. using symmetries in data and as-well-as duality between model-selection and data-selection), in the spirit of data dependent performance bounds [22], challenging the common simplistic assumption in machine learning that training and validation/test sets must follow the same distribution; (3) analyzing the computational complexity of algorithms and proposing methods to cut down or distribute the computational burden, addressing all classical issue of hyper performance computing, including concurrency, parallelism, communication (synchronous or not), memory management, and scaling [23].

• Benchmarks and applications. The student will demonstrate the product of his research on large scientific datasets of Argonne national labs from computational fluid dynamics (sea surface temperature) and cancer research (including molecular footprints of drugs). Additionally, he will use existing benchmarks, in particular the 60 datasets that were formatted for the AutoDL challenge [24] in which former doctoral students of LISN were involved, covering a wide variety of application domains, including biology and medicine, ecology, energy and sustainability management, images of objects, humans, animals, scenes, areal images, text, audio, speech, video and other sensor data, internet social media management and advertising, market analysis and financial prediction. To conduct his experiments, he will have available the computer infrastructure of Argonne national Labs and the super-computer Jean Zay at Université Paris-Saclay.

In the first year, the student will explore techniques of data-selection/datasampling/data-augmentation in application to Deep Learning. This first direction is motivated by remarkable results in the recent literature: Notably, boosting methods [25] and importance sampling methods [26,27] such as XGboost [28], which put emphasis on informative examples, consistently win machine learning competitions [5]. Additionally, recent results on the well-studied CIFAR-10 dataset have shown that data-augmentation can yield huge improvements in performance of state-of-the-art networks such as ResNet20 (from 84% to 92% accuracy). Thus, there is an opportunity for importing boosting techniques in Deep Learning and combining them with data-augmentation, furthering pioneering work [17]. This will give the student the opportunity of getting acquainted with the problem of data-centric machine learning, and start addressing fundamental questions such as overcoming various types of errors (random, epistemic, aleatoric, systematic, among others) in an automated manner. He will benchmark existing and newly proposed methods on the AutoDL challenge datasets, which should lead to an initial publication.

The student will then conduct an extensive survey of data-centric machine learning and identify critical issues to be addressed. The survey will also lead to a publication.

Subsequent directions to be pursued in the rest of the thesis may include: combining data selection and model selection, involving human feed-back in data-selection and data-cleaning, exploring issues touching bias in data and fairness, use of data simulators and calibration of simulators, model decision explainability in relation to the selection of representative or extreme examples, treating multi-view or multi-modal data, and scaling up methods to supercomputers Theta, Aurora (Argonne National Laboratory, USA) and Jean-Zay (GENCI, France) to provide applied scientists (e.g., chemists, physicists) with efficient end to end data-oriented learning algorithms. We anticipate that this line of research will have considerable impact in many application areas in which Deep learning has proven effective. It will enable the use of datasets with limited size or low quality to efficiently train such models or the fusion of data from many sources that are collected in various conditions and are potentially not well calibrated. Given the current imbalance of effort of the research community between model-centric and data-centric approaches, we will contribute to shift paradigms and hopefully make seminal contributions to this domain, which is in its infancy.

References

[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.

[2] M. Z. Alom, T. M. Taha, C. Yakopcic, S. Westberg, P. Sidike, M. S. Nasrin, M. Hasan, B. C. Van Essen, A. A. Awwal, and V. K. Asari, “A state-of-the-art survey on deep learning theory and architectures,” Electronics, vol. 8, no. 3, p. 292, 2019.

[3] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.

[4] C. Sun, A. Shrivastava, S. Singh, and A. Gupta, “Revisiting unreasonable effectiveness of data in deep learning era,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 843–852.

[5] “Anthony goldbloom gives you the secret to winning kaggle com- petitions.” [Online]. Available: https://www.kdnuggets.com/2016/01/ anthony-goldbloom-secret-winning-kaggle-competitions.html

[6] K. Swersky, J. Snoek, and R. P. Adams, “Freeze-thaw bayesian optimiza- tion,” arXiv preprint arXiv:1406.3896, 2014.

[7] M. F. et al, “Efficient and robust automated machine learning,” in Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds. Curran Associates, Inc., 2015, pp. 2962–2970. [Online]. Available: http://papers. nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf

[8] B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” arXiv preprint arXiv:1611.01578, 2016.

[9] B. Baker, O. Gupta, N. Naik, and R. Raskar, “Designing Neural Network Architectures using Reinforcement Learning,” arXiv:1611.02167 [cs], Nov. 2016, arXiv: 1611.02167. [Online]. Available: http: //arxiv.org/abs/1611.02167

[10] A. Klein, S. Falkner, S. Bartels, P. Hennig, and F. Hutter, “Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets,” in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017), ser. Proceedings of Machine Learning Research, vol. 54. PMLR, Apr. 2017, pp. 528–536. [Online]. Available: http://proceedings.mlr.press/v54/klein17a.html

[11] F. Assun ̧c ̃ao, N. Lourenc ̧o, P. Machado, and B. Ribeiro, “Evolving the topology of large scale deep neural networks,” in European Conference on Genetic Programming. Springer, 2018, pp. 19–34.

[12] E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. Le, and A. Kurakin, “Large-Scale Evolution of Image Classifiers,” arXiv:1703.01041 [cs], Mar. 2017, arXiv: 1703.01041. [Online]. Available: http://arxiv.org/abs/1703.01041

[13] F. P. Such, V. Madhavan, E. Conti, J. Lehman, K. O. Stanley, and J. Clune, “Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning,” CoRR, vol. abs/1712.06567, 2017. [Online]. Available: http://arxiv.org/abs/1712.06567

[14] H. Liu, K. Simonyan, and Y. Yang, “DARTS: Differentiable architecture search.” [Online]. Available: http://arxiv.org/abs/1806.09055

[15] E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, “Regularized evolution for image classifier architecture search.” [Online]. Available: http://arxiv.org/abs/1802.01548

[16] S. Falkner, A. Klein, and F. Hutter, “BOHB: Robust and efficient hyperparameter optimization at scale.” [Online]. Available: http: //arxiv.org/abs/1807.01774

[17] E. D. Cubuk, B. Zoph, D. Man ́e, V. Vasudevan, and Q. V. Le, “Autoaugment: Learning augmentation policies from data,” CoRR, vol. abs/1805.09501, 2018. [Online]. Available: http://arxiv.org/abs/1805. 09501

[18] P. Balaprakash, R. Egele, M. Salim, S. Wild, V. Vishwanath, F. Xia, T. Brettin, and R. Stevens, “Scalable reinforcement-learning-based neural architecture search for cancer deep learning research,” pp. 1–33. [Online]. Available: http://arxiv.org/abs/1909.00311

[19] R. Maulik, R. Egele, B. Lusch, and P. Balaprakash, “Recurrent neural network architecture search for geophysical emulation.” [Online]. Available: http://arxiv.org/abs/2004.10928

[20] R. Egele, P. Balaprakash, V. Vishwanath, I. Guyon, and Z. Liu, “AgEBO- tabular: Joint neural architecture and hyperparameter search with autotuned data-parallel training for tabular data.” [Online]. Available: http://arxiv.org/abs/2010.16358

[21] M. Abdar, F. Pourpanah, S. Hussain, D. Rezazadegan, L. Liu, M. Ghavamzadeh, P. Fieguth, X. Cao, A. Khosravi, U. R. Acharya, V. Makarenkov, and S. Nahavandi, “A review of uncertainty quantification in deep learning: Techniques, applications and challenges,” Information Fusion, 2021. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S1566253521001081

[22] V. Vapnik, V. VAPNIK, and V. Vapnik, Statistical Learning Theory, ser. A Wiley-Interscience publication. Wiley, 1998. [Online]. Available: https://books.google.fr/books?id=GowoAQAAMAAJ

[23] T. Ben-Nun and T. Hoefler, “Demystifying parallel and distributed deep learning: An in-depth concurrency analysis,” ACM Computing Surveys (CSUR), vol. 52, no. 4, pp. 1–43, 2019.

[24] Z. Liu, A. Pavao, Z. Xu, S. Escalera, F. Ferreira, I. Guyon, S. Hong, F. Hut- ter, R. Ji, J. C. Junior, G. Li, M. Lindauer, L. Zhipeng, M. Madadi, T. Nier- hoff, K. Niu, C. Pan, D. Stoll, S. Treger, W. Jin, P. Wang, C. Wu, Y. Xiong, A. Zela, and Y. Zhang, “Winning solutions and post-challenge analyses of

the chalearn autodl challenge 2019,” IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, pp. 1–1, 2021.

[25] Y. Freund, R. Schapire, and N. Abe, “A short introduction to boosting,” Journal-Japanese Society For Artificial Intelligence, vol. 14, no. 771-780, p. 1612, 1999.

[26] A. Katharopoulos and F. Fleuret, “Not all samples are created equal: Deep learning with importance sampling,” p. 10.

[27] T. B. Johnson and C. Guestrin, “Training deep models faster with robust, approximate importance sampling,” p. 11.

[28] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” CoRR, vol. abs/1603.02754, 2016. [Online]. Available: http: //arxiv.org/abs/1603.02754