What Is AutoML and How Can I Use It?
The no free lunch theorem
The “no free lunch” (NFL) theorem by David Wolpert and William Macready basically implies that no single machine learning algorithm is universally the best-performing algorithm for all problems. So when tackling a new problem to solve with machine learning (ML), you don’t know which machine learning model should be used. Hence, you have to keep on trying different models, compare their outcomes, and choose the best one for the problem at hand. Needless to say, this can be a tedious process.
What is AutoML?
Auto Machine Learning, or AutoML for short, helps automate the more tedious parts of the machine learning process.
As mentioned above, one of the more tedious parts involves trying a bunch of different machine learning models, comparing them, and finally selecting the best one. If you do that from scratch, it’s a lot of code. At times there could be 10 or 15 different machine learning models you could use. If you want to do that from by hand, you have to write a bunch of code to try each model, calculate some score, compare it, and then choose the best one.
Another tedious part could be searching hyper parameters, which is optimizing the algorithm to perform better, help automate some of the analysis, and it can also optimize some of the pre-processing steps of the data (but more on that later).
Instead of doing all of these manually, with AutoML tools, you can just use a few lines of code or an out-of-the-box GUI and do the same things faster.
AutoML and farming analogy
A few hundred years ago, farming was very manual. You had to go pick things out of the ground by hand. Now, a lot of farming is mechanized. There’s even people doing robotic strawberry picking, which is really complex.
AutoML is just like that, an extension of our tools that simplify the picking of the most suitable machine learning model.
The benefits of AutoML
As you might have guessed it, AutoML provides an efficiency boost. It allows you to get to a decent baseline machine learning model very quickly. So instead of fiddling around tuning a model and trying different models, you can just get to one that’s tuned and the best one out of a selection of models pretty quickly.
Let’s take a concrete example. Let’s say that you have an image classification you want to do such as taking pictures of dogs and classify their breed. The outcome of the machine learning algorithm would then be to tell you what breed it is.
If you try that project from scratch or even, with some of the more modern machine learning or neural network tools out there, it takes a really long time and you might not even be successful after spending plenty of hours or even days working on it.
Whereas if you AutoML solution you can drop that work down to a few hours and be guaranteed good results as long as your data is good.
Another benefit of using AutoML is that you can use it without being proficient in Python, R, Spark, etc. There are versions out there with an intuitive GUI interface that can get you up and running rather quickly.
In conclusion, the benefits of AutoML are that it is a big time saver and it can also allow non-experts to use it.
Enterprise AutoML options:
- Google AutoML
- Xpanse Analytics
- Einstein Prediction Builder
Open source (some with paid tiers) AutoML options:
- AutoWeka: an approach for the simultaneous selection of a machine learning algorithm and its hyperparameters; combine it with the WEKA package to automatically yield good models for a wide variety of data sets.
- Auto-sklearn: an AutoWEKA extension using the Python library scikit-learn. It provides a drop-in replacement for regular scikit-learn classifiers and regressors.
- Auto-PyTorch: based on the deep learning framework PyTorch. It jointly optimizes hyperparameters and the neural architecture.
- AutoGluon: a multi-layer stacking approach of different ML models.
- H2O AutoML: provides automated model selection and ensembling for the H2O machine learning and data analytics platform.
- MLBoX: deals with preprocessing, optimization and prediction.
- TPOT: a data-science assistant which optimizes machine learning pipelines using genetic programming.
- TransmogrifAI: an AutoML library running on top of Spark.
Things to watch out for
You should not treat AutoML as a magic black box. You should not use it when you know absolutely nothing about machine learning and then try to use it for your problem, because you might end up with some issue that is problematic, but stems from a technical issue, like over-fitting, or just various bad data problems.
Yes, it does lower the barrier of entry for how much knowledge you need to use it. You don’t have to be a coding expert as there are even AutoML version with a GUI, but I you should still know a little bit of machine learning basics.
Unfortunately with AutoML you still have to watch out for pre-processing. Yes, there are some AutoML programs such as the TPOT Python library that will actually take you from raw data all the way to a machine learning. It can help you try a bunch of combinations of pre-processing steps, but it’s still limited.
So when you’re working with machine learning, even if you make use of AutoML, you still need to pay attention to your pre-processing while ensuring you have a little bit of domain knowledge about it.
Which AutoML option to go for
There is a trade-off. If you’re using the enterprise and/or GUI based option then that will be the fastest and easiest way to use an AutoML solution. At the same time it will also be less customizable.
If you choose the standalone executable/ open source solution, you can usually customize anything you want. You can squeeze that last bit of performance out of it, but it’s going to usually take longer to implement.
You also have to consider the features of the AutoML solution as not all deliver the same benefits. Some can help automate data pre-processing and some do not. Plus not all of them contain the same models.
Lastly, some versions come with an associated cost.
If you’re in the field of data science and machine learning, you should definitely be learning AutoML and using it. Watch for its pitfalls and shortcomings while using it to speed up the deliverables of your machine learning project. If you’d like to learn more, checkout this Lights On Data Show episode on “Getting Started with AutoML” out of which some of the above content was drawn from.