Introduction to Automated Machine Learning with Auto-Sklearn ⚙️

Simon Provost
15 min readMay 28, 2022

--

According to the book “Hands-On Automated Machine learning” (AutoML), `AutoML aims to ease the process of buildng ML models by automating commonly-used steps, such as feature preprocessing, model selection, and hyperparameters tuning`. Naturally, the AutoML’s high level of automation enables non-experts to use machine learning models and approaches without extensive prior familiarity with machine learning. Regarding trends, AutoML was the leading trend in the ML industry and research community in 2019. See graph below; AutoML is under the section Peak of Inflated Expectations, which indicates that early exposure generates a number of success stories. Some businesses act immediately, while others wait until the technology provides a significant benefit. Nonetheless, the trend is apparent, and the community has high expectations for the technology.

Regardless of the reasons why people are so enthused about AutoML, it is irrefutable, in my humble opinion, that AutoML will significantly alter the way people work in data science research and industry in the future years. As a data scientist at LittleBigCode, I would gladly utilise AutoML daily for two specific reasons: (1) allowing me to focus on more mind-challenging aspects of my machine learning project, while AutoML works in the background for hours to find the optimal combination; (2) allowing me to overcome my current knowledge by discovering new algorithms I am unfamiliar with, if AutoML’s analysis concludes that it is the most accurate one. Thus, enhancing the credibility of your project by demonstrating that numerous combinations have been evaluated on the dataset.

Figure — Source: [30]

In the following article, we will introduce AutoML both with and without Auto-Sklearn. Its definition, followed by an illustration and a summary of the topic. Auto-Sklearn has been selected over many others, such as MLBox, TPOT, H2O AutoML, Auto-KERAS (Neural Architecture Search), TransmogrifAI, Auto-WEKA, JAD-BIO, Auto-PYTORCH (Neural Architecture Search), etc., because it has been recognised as a powerful framework in the community among numerous ML-competitions. This does not imply that one framework is superior to another, but the community surrounding Auto-Sklearn is comparable to that of Scikit-learn and that is why it makes sense to offer AutoML beginners to such an accessible tool as same as Scikit-Learn. However, each of the above-mentioned tools, as well as the remaining ones on the market, are worth investigating to determine their applicability to your use case. Auto-Sklearn is mostly used for classification and regression and does not employ deep learning, for more information see below.

What define Automated Machine Learning?

The notion of Automated Machine-Learning has been debated in the literature; nonetheless, the following definition is sufficient for new and experienced researchers in the subject in terms of its description and long-term goal:

AutoML aims to automatically compose and parametrise machine learning algorithms into machine learning pipelines with the objective of optimising some given metric. Typically, the sub-pipelines of a basic Auto-ML system are connect to two components: the first one is the preprocessing (feature selection, transformation, and imputation, etc.) and a second one — algorithm selection (classification, regression, etc.). The state-of-the-art includes the two most popular supervised automated machine learning frameworks, Auto-Sklearn (based on Scikit Learn [1]) and Auto-Weka (based on WEKA [2]) [3, 4]. A supervised AutoML system is mathematically described as follows:

Where f is the task’s best generalisation. f is also identical to what we refer to as a Full Model or Pipeline [5, 6]. The pipeline is made by 𝜐, which denotes the supervised learning algorithm (e.g., XGBoost, Random Forest, etc.) and θ𝜐, which denotes the hyperparameters of the supervised learning algorithm. Φ denotes the preprocessing technique (e.g., feature resampling/imputation, etc.), and, if applicable, θΦ(X) the hyperparameters associated with the preprocessing technique chosen. Finally, an AutoML system will attempt to identify the optimal combination of preprocessing technique and learning algorithm, as well as their respective hyperparameters, given a dataset, denoted D.

In a nutshell, an AutoML search optimisation system aims to perform (1) the optimisation of estimators and predictors (i.e., algorithm selection) [7] ; (2) the optimisation of learning algorithms and their hyperparameters (i.e., hyperparameter optimisation) [8, 9, 6, 4]; and (3) the optimisation of meta-learning algorithms [10, 3]. Preprocessing techniques are subject to the same optimisation process but based on a subset of techniques designed specifically for preprocessing.

What are the sub problems of AutoML? (CASH / HPO / SMAC / NAS / AO)

Now that AutoML has been explained, let’s take a quick look at the subproblems it solves to reach its main goal of optimising choice modelling and its hyper parameters:

Combined Algorithm and Hyper-Parameter Selection (CASH) and Hyper-parameter optimisation (HPO)

The CASH problem [11] is concerned with automatically and concurrently selecting a learning algorithm and its parameters, whereas the HPO problem is concerned with providing the best feasible model instance from a vector of selected algorithms. Thus, the combination of the two is somewhat incontestable. The CASH procedure can be explained in a nutshell by the fact that it treats every algorithm as a hyper parameter, optimising these hyper parameters by providing a set of the best algorithms for the given dataset. On the other hand, the HPO takes into account the best CASH’s outputs, providing a pipeline of algorithms and their hyperparameters, and attempts to tune each set of hyper parameter to its best feasible instance.

The CASH and HPO problems require testing a large number of hypotheses and selecting the most accurate one as the best predictive model for the given training set. Consider, for example, that all based-forest algorithms (Decision Tree, Random Forest, XGBoost, Deep Forest, etc.) have at least ten hyperparameters, each of which can take on ten distinct values; therefore, checking the CASH and HPO configuration spaces for a single algorithm requires 1010 permutations. Consequently, tuning n algorithms with j hyper-parameters can be quite costly.

Sequential Model-based Algorithm Configuration (SMAC)

SMAC [18, 19, 20] is a versatile HPO tool that helps algorithm creators optimise hyper-parameters. This promising strategy builds a promising configuration using tree/local search, compares all possible configurations using a Random Online Adaptive Racing (ROAR) method [12], and then reveals the most accurate hyperparameter combination discovered for the algorithm and the given dataset.

Neural Architecture Search (NAS) and Architecture Optimisation (AO)

The NAS problem is to design a high-performing neural architecture by selecting and combining basic operations [13, 14, 15, 16]. Classification problems aim to pick the best algorithm and hyperparameters, while NAS problems assess the best neural network architecture for a particular dataset. AO, on the other hand, is a subcomponent of NAS and is also regarded as the Neural Architecture Search optimization method. It was first introduced in an application by Barret Zoph and Quoc Le [27], which used reinforcement learning to train a recurrent neural network to automatically search for the best-performing architecture. AO consists of reinforcement learning, evolution-based algorithms, gradient descent, surrogate model-based optimization, and hybrid AO approaches [17]. It determines NAS’s ideal design by optimising a given architecture (i.e., number of layers, learning rate, etc.). It roughly can be seen as the HPO for the CASH problem (i.e., optimisation method). As a result, one promising framework combining NAS and AO is Auto-KERAS or Auto-Pytorch.

While we defined the jargon, Auto-Sklearn is predicated on the CASH/HPO and SMAC problems, so we will focus primarily on these in the remaining portion of the article:

What is Auto-Sklearn ?

The original release of the auto-Sklearn package in 2015 by the AI-lab at the University of Freiburg sought to enhance Bayesian Optimization through meta-learning. The framework employs 15 classifiers, 14 methods for feature processing, and 3 methods for data preprocessing, with a total of 132 hyper-parameters. There are two versions of this framework: 1.0 [18] and 2.0 [19]. The primary improvements of the first edition of AutoML were its CASH and HPO performance, whereas the primary improvement of the second version was the incorporation of a simpler and more effective approach for meta-learning.

Awarded prizes

Auto-Sklearn received its first award at the first worldwide AutoML Challenge, where it outperformed rival frameworks on some sub-challenges but not all; yet, it still placed first in the competition held between 2015 and 2016 [20, 21, 22, 23]. They had a few months to improve their framework so they could come back the next year and still do well, and they did. In 2017–2018, they won the second international AutoML competition [24, 25, 26].

Pre-Configure Algorithms to search-on

The following is the list of pre-configured algorithms available in Auto-Sklearn:

AdaBoost ; Bernoulli naive Bayes ; decision tree ; extreml. rand. trees ; Gaussian naive Bayes ; gradient boosting ; k-Nearest Neighbour ; LDA ; linear SVM ; kernel SVM ; multinomial naive Bayes ; passive aggressive ; QDA ; random forest ; Linear Classification.

Its architecture

Before delving deeper into the architecture, Auto-Sklearn requires a number of parameters, including two distinct thresholds. One relates to the threshold that, when reached, stops the process of tuning a given algorithm (i.e., HPO), and the other, which could be considered the global threshold, is the process of finding algorithms (I.e., CASH). Now, speaking of the architecture. In a nutshell, (1) the user supplies the pipeline with raw data, which has to be divided into training and testing sets. (2) The meta-learning phase is then executed, which is one of the greatest advancements of this framework in the AutoML field as it, roughly speaking, uses the similarity of your dataset to some already known from the literature/web, and if there is a match, a list of techniques that performed well on such a dataset is passed as the priority to investigate them through the pipeline. Then, regardless of whether the meta-learning step outputs. (3) We enter the optimisation cycle: (a)we randomly select a data pre-processor, (b) we randomly select a feature pre-processor, and (c) we randomly select a classifier, and then we use the bayesian optimiser to optimise their hyper parameters until sub-pipeline threshold is reached. This cycle is repeated for each available classifier until the overall threshold is reached, at which point the pipeline ceases and (4) builds an ensemble of all sub-pipeline combinations, ranking them from most accurate to least accurate based on a user-defined metric. As a result, the user is provided with the best model and/or test set prediction probabilities for classes.

Simple-usecase breast cancer classification using Auto-Sklearn

Auto-Sklearn is presently only accessible via Python, so we will assume that you already have it installed. Regarding installation and OS-compatibility, see here. In addition, we are loading data from sklearn databases, therefore this article has no RGPD-based data involvement on our end.

Consider an example to illustrate how simple it is to use this framework. Here, we will predict the breast cancer dataset, which is a classic and extremely straightforward binary classification dataset (see more):

We first load our data into partitions with, by default, 75 percent for the training set and 25 percent for the test set. Afterward, we instantiate the Auto-Sklearn classifier (version 1.0 here) and utilise three distinct parameters, but more may be observed in the documentation:

  • Time Left for this task: Maximum number of seconds allowed for the entire pipeline search. By increasing this value, auto-sklearn has a greater likelihood of discovering superior models. In addition, this is the global threshold, therefore if you enter 54,000 seconds, the procedure will search for the optimal model for the next fifteen hours. However, you should think carefully about this value because if you search too much, you can overfit your data (see [28] for more information about overfitting and automl).
  • Per run time limit: Time limit for a single call to the machine learning model. Model fitting will be terminated if the machine learning algorithm runs over the time limit. Set this value high enough so that typical machine learning algorithms can be fitted on the training data. In addition, if this parameter is set to a value that is too high, the general pipeline might theoretically try fewer classifiers because the preceding parameter may also be hit. As a user, you have to find a good balance.
  • Memory limit: Memory limit in MB for the machine learning algorithm. Auto-sklearn will stop fitting the machine learning algorithm if it tries to allocate more than memory_limit MB.
  • Note that the documentation contains numerous other arguments such as what metric to optimise in the pipeline, etc.

Finally, you are ready to acquire the optimal output result of your pipeline search; note that for demonstration purposes, we just output the accuracy of the champion model here. Nonetheless, the documentation contains a plethora of more specific and various outputs you can obtain:

Discussion and conclusion

AutoML will play a role in the future of machine learning, but it is not without drawbacks. Among the few things I observe regarding AutoML are:

  • AutoML will replace data-scientist. Numerous practitioners feel that AutoML will replace them in the near future, making this discipline somewhat rejected by some. However, I have my doubts about this being the case. I believe that we, as humans, are ultimately the ones who decide which results are viable enough to be produced and deployed, so we will not be replaced by Automated Machine Learning, but rather will use it as a tool in our daily activities and develop new ways of focussing-on in our respective professions.
  • Data is still the most important thing. You could input any raw data into the pipeline, but without any preprocessing or additional data understanding, there would be no good results. This could be extremely inconvenient for non-specialists like medical practitioners. Even while they may have data preprocessing training, it may not be as extensive as that of an ML-practitioner. As a consequence, AutoML is somewhat complex for non-experts from this standpoint.
  • The prediction metric is the only aim. In the data science profession, the prediction model is not always the only thing desired. Typically, if you wish to determine which features of your dataset are the most significant, AutoML is currently incapable of producing such a list, so there are tricks available but not very-easy for non-specialists. Another last example, if you wish the champion model’s explainability, it is not yet achievable by default, even though it is essential for non-experts such as medical practitioners. Still, keep in mind that a pipeline profiler for Auto-Sklearn was recently just researched and published [29].
  • Replacing my good-old random forest. Perhaps not necessarily. However, as a good practice, it is recommended to always launch a random forest with 1000 trees and the rest of the default parameters in order to establish a baseline against which to compare the output of your AutoML pipeline. It may or may not replace it if the AutoML system you employ is unable to accurately forecast your events.

Despite the limitations, the application of this branch of ML in your mission and project can be extremely beneficial. As a closing remark, I hope you now comprehend the AutoML terminology and functioning of auto-sklearn, a significant framework among numerous others. I also hope you will investigate AutoML further. However, I would like to point out that there is still future work to be done in the AutoML field, particularly with imbalanced data, which is common in the medical field but not yet in the AutoML field, as well as the method for predicting multi-label learning as opposed to binary, which presents an even greater challenge. Consequently, there are still a number of uncharted sub-areas in AutoML, and this is just the beginning of a robust subfield of Machine learning.

If you have any questions or would like to share your experience, please feel free to leave a comment 🥳

Simon 🔬

References

[1] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon- del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch- esnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Re- search, vol. 12, pp. 2825–2830, 2011.

[2] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The weka data mining software: an update,” ACM SIGKDD explorations newsletter, vol. 11, no. 1, pp. 10–18, 2009.

[3] M. Feurer, A. Klein, K. Eggensperger, J. T. Springenberg, M. Blum, and F. Hutter, “Auto-sklearn: efficient and robust auto- mated machine learning,” in Automated Ma- chine Learning. Springer, Cham, 2019, pp. 113–134.

[4] C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Auto-weka: Combined selection and hyperparameter optimization of classification algorithms,” in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 2013, pp. 847–855.

[5] J. Escalante, “Towards a particle swarm model selection algorithm,” in Multi-level in- ference workshop and model• HJ Escalante. Towards a Particle Swarm Model Selection algorithm. Multi-level inference workshop and model selection game, NIPS, Whistler, VA, BC, Canada, 2006.

[6] H. J. Escalante, M. Montes, and L. E. Sucar, “Particle swarm model selection.” Journal of Machine Learning Research, vol. 10, no. 2, 2009.

[7] J. R. Rice, “The algorithm selection prob- lem,” in Advances in computers. Elsevier, 1976, vol. 15, pp. 65–118.

[8] D. Gorissen, T. Dhaene, and F. De Turck, “Evolutionary model type selection for global surrogate modeling,” Journal of Ma- chine Learning Research, vol. 10, pp. 2039– 2078, 2009.

[9] Q. Sun, B. Pfahringer, and M. Mayo, “Full model selection in the space of data mining operators,” in Proceedings of the 14th annual conference companion on genetic and evolu- tionary computation, 2012, pp. 1503–1504.

[10] K. A. Smith-Miles, “Cross-disciplinary per- spectives on meta-learning for algorithm se- lection,” ACM Computing Surveys (CSUR), vol. 41, no. 1, pp. 1–25, 2009.

[11] C.Thornton,F.Hutter,H.H.Hoos,andK.Leyton-Brown,“Auto-weka:Combinedselection and hyperparameter optimization of classification algorithms,” in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 2013, pp. 847–855.

[12] F. Hutter, H. Hoos, K. Leyton-Brown, T. Stützle Paramils: an automatic algorithm configuration framework. J. Artif. Intell. Res. (36) (2009), pp. 267–306

[13] M. F. Tenorio and W.-T. Lee, “Self organizing neural networks for the identification problem,” 1988, pp. 57–64. [Online]. Available: https://papers.nips.cc/paper/149-self- organizing-neural-networks-for-the-identification-problem

[14] H. Kitano, “Designing neural networks using genetic algorithms with graph generation system,” vol. 4, no. 4, 1990. [Online]. Available: http://www.complex-systems.com/ abstracts/v04_i04_a06/

[15] P. J. Angeline, G. M. Saunders, and J. B. Pollack, “An evolutionary algorithm that constructs recurrent neural networks,” vol. 5, no. 1, pp. 54–65, 1994. [Online]. Available: https://ieeexplore.ieee.org/document/265960/

[16] S. R. Young, D. C. Rose, T. P. Karnowski, S.-H. Lim, and R. M. Patton, “Optimizing deep learning hyper-parameters through an evolutionary algorithm,” 2015, pp. 4:1–4:5. [Online]. Available: https://dl.acm.org/citation.cfm?id=2834896

[17] X. He, K. Zhao, and X. Chu, “AutoML: A survey of the state-of-the-art,” Knowledge-Based Systems, vol. 212, p. 106622, Jan. 2021. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0950705120307516

[18] M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter, “Efficient and robust automated machine learning,” in Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., vol. 28. Curran Associates, Inc., 2015. [Online]. Available: https://proceedings.neurips.cc/paper/2015/file/11d0e6287202fced83f79975ec59a3a6-Paper.pdf

[19] M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter, “Auto-sklearn 2.0: The next generation,” arXiv preprint arXiv:2007.04074, 2020.

[20] I. Guyon, L. Sun-Hosoya, M. Boullé, H. J. Escalante, S. Escalera, Z. Liu, D. Jajetic, B. Ray, M. Saeed, M. Sebag, A. Statnikov, W. Tu, and E. Viegas, “Analysis of the automl challenge series 2015–2018,” in AutoML, ser. Springer series on Challenges in Machine Learning, 2019.

[21] I. Guyon, K. Bennett, G. Cawley, H. J. Escalante, S. Escalera, T. K. Ho, N. Macià, B. Ray, M. Saeed, A. Statnikov, and E. Viegas, “Design of the 2015 ChaLearn AutoML challenge,” in Proc. of IJCNN, 2015. [Online]. Available: http://www.causality.inf.ethz.ch/AutoML/automl_ijcnn15.pdf

[22] — — , “AutoML challenge 2015: Design and first results,” in Proc. of AutoML 2015@ICML, 2015. [Online]. Available: https://drive.google.com/file/d/0BzRGLkqgrI-qWkpzcGw4bFpBMUk/view

[23] I. Guyon, I. Chaabane, H. J. Escalante, S. Escalera, D. Jajetic, J. R. Lloyd, N. Macía, B. Ray, L. Romaszko, M. Sebag, A. Statnikov, S. Treguer, and E. Viegas, “A brief review of the chalearn automl challenge,” in Proc. of AutoML 2016@ICML, 2016. [Online].

[24] Automl 2018 challenge :: Pakdd2018,” March 2020. [Online]. Available: https://competitions.codalab.org/competitions/17767

[25] I. Guyon, L. Sun-Hosoya, M. Boullé, H. Escalante, S. Escalera, Z. Liu, D. Jajetic, B. Ray, M. Saeed, M. Sebag et al., Analysis of the AutoML Challenge series 2015–2018, 2017. [Online]. Available: https://www.automl.org/book/

[26] W.TuandE.Viegas,“Analysisoftheautomlchallengeseries2015–2018.”

[27] Zoph, B. and Le, Q.V., 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578.

[28] Fabris, F. and Freitas, A.A., 2019, September. Analysing the overfit of the auto-sklearn automated machine learning tool. In International Conference on Machine Learning, Optimization, and Data Science (pp. 508–520). Springer, Cham.

[29] Ono, J.P., Castelo, S., Lopez, R., Bertini, E., Freire, J. and Silva, C., 2020. Pipelineprofiler: A visual analytics tool for the exploration of automl pipelines. IEEE Transactions on Visualization and Computer Graphics, 27(2), pp.390–400.

--

--