Power of Machine Learning with AutoML

Data-driven insights provide a much-needed advantage to modern businesses to run ahead of their competition. However, due to the overwhelming complexity of the techniques used to generate such insights, many businesses are still reluctant to adopt necessary measures to use data for their benefit. For example, advanced computational approaches, like data science and machine learning, have pioneered many breakthroughs in data analytics and forecasting. But the technical expertise required for taking advantage of these developments discourages people from enjoying the true power of data. 

What if there is a way to abstract the complexity of machine learning to allow people without advanced technical knowledge to build, train, and deploy machine learning models? This is indeed the functionality AutoML, or automated machine learning, delivers to its users. 

First emerged back in 2013, the popularity of AutoML started growing with the introduction of Google’s AutoML solution in 2018. Today, the AutoML market has expanded significantly with many paid and open-source solutions that automatically handle several, if not all, stages of the machine learning workflow. 

With the state of AutoML as it is today, the question is, how can you benefit from the value machine learning adds to data processing with AutoML tools? Let’s find the answer to this by understanding what exactly AutoML is. 

What exactly is AutoML?

A typical machine learning pipeline consists of several stages to prepare data and train models. Listed below are the main steps usually followed in machine learning to build predictive and analytical models. 

  1. Data extraction: Collect data from a variety of sources to prepare a dataset for training the model.
  2. Data exploration: Understand the nature of the dataset and its properties through aggregations and visualizations. 
  3. Data preparation (preprocessing): Filter and clean the collected data samples to prepare an unbiased dataset for model training and evaluation. 
  4. Feature engineering: Transform the raw data into features that allow the model to better understand the problem it’s trying to solve. 
  5. Model selection: Identify the type of machine learning problem the dataset represents (e.g., regression, classification, clustering, etc.) and build an appropriate model architecture to solve that problem. 
  6. Model training & evaluation: Train and evaluate the model performance on the prepared train and test datasets.
  7. Hyperparameter tuning: Find an optimal set of hyperparameters that allows the selected model to achieve the best results.
  8. Model deployment: Deploy trained ML model in production. 

In this pipeline, steps 3-7, especially, demand advanced technical expertise and domain knowledge that is often beyond those who have not specialized in machine learning. For this reason, other than a select group of technical experts and engineers, machine learning has been a difficult technology to adopt for many businesses. 

Components of the machine learning pipeline
Components of the machine learning pipeline

AutoML, on the other hand, just as its name suggests, provides a machine learning pipeline that automates a number of steps involved in the process, from data exploration to model deployment, to make it accessible to even those without a technical background. Ideally, it builds a direct pathway between data extraction and model deployment, automatically handling all intermediary steps on its own. While not all AutoML solutions in the market today enable automation for all these steps, they support the most challenging parts of the ML pipeline, such as feature engineering, model section, and hyperparameter tuning. 

AutoML is still a new branch of artificial intelligence that has a long way to go before replacing the experience-horned knowledge of actual data scientists. Even under those circumstances, though, AutoML already provides a competitive alternative to human-built ML pipelines to build production-ready, scalable machine learning models within a short time for a very low cost. 

AutoML tools available for your use 

Today, dozens of paid and open-source AutoML tools are available in the market for developing ML models. While some of them cater to specific machine learning problems like image processing, deep learning, and natural language processing, others support a wider range of problems with automated workflows. In this section, let’s look at some such popular AutoML tools to understand the services and the level of automation they offer to users. 

Google Cloud AutoML

Google’s AutoML is, in fact, a set of products designed for specific types of ML problems. These products include:

  • AutoML Image— Automated machine learning for object detection and image classification. 
  • AutoML Text— AutoML for text-based tasks like intent classification and sentiment analysis. 
  • AutoML Translation— AutoML for dynamically detecting and translating languages. 
  • AutoML Tabular— AutoML for problems with structured, tabular data. 
  • Vertex AI— A unified platform for building ML models for different types of problems. 

These products provide end-to-end machine learning solutions, where you can input a preprocessed dataset and obtain a production-ready model. It automatically conducts feature engineering tasks such as encoding categorical values and feature selection. 

In addition to usual machine learning and deep learning solutions, the platform supports the transfer learning approach to increase the model performance. In such cases, a model already trained on a Google-owned dataset is fine-tuned to suit your specific needs. It’s advantageous when working with text, image, and video data. 

Google AutoML uses Neural Architecture Search, one of the popular architecture building techniques in AutoML, to come up with a suitable model architecture. The platform also tries out an ensemble of the top-performing models to obtain better results than any individual model. 

Vertex AI dashboard (source: cloud.google.com)
Vertex AI dashboard (source: cloud.google.com)

Google AutoML is undoubtedly one of the best AutoML solutions in the market. With Google’s dedication to advancing research in this field, it’s set to become even more powerful in the future. The only downside to relying on Google for your AutoML needs is the high cloud charges associated with the Google Cloud Platform. 

H2O AutoML

H2O AutoML provides an open-source, automated machine learning solution for tabular data. It automates parts of the ML pipeline from feature engineering to hyperparameter tuning, incorporating some of the latest developments in the field in its process. 

For example, it supports encoding categorical values using target encoding and trains a set of pre-determined models—including XGBoost Gradient Boosting Machines (GBM), Default Random Forest,  deep neural nets, and H2O-built GBMs—with hyperparameter tuning. It also evaluates the performance of two stacked ensembles of the above models. Finally, it uses the best-performing model out of all models to generate predictions for the dataset. 

As an open-source solution, H2O AutoML provides an affordable alternative to Google AutoML. Its inability to work with unstructured data like text is one of the major downsides of the platform. 

H2O AutoML dashboard (source: h2o.ai)
H2O AutoML dashboard (source: h2o.ai)

Auto-Sklearn

Auto-Sklearn is built on top of the popular data science library, Scikit-learn. It supports advanced feature engineering techniques such as one-hot encoding, principle component analysis, and standardization. It then performs model selection and hyperparameter tuning with built-in Scikit-learn models for classification and regression problems. 

Auto-Sklearn incorporates several latest ML developments like meta-learning and Bayesian optimization to build models competitive with human-created ones. 

AutoKeras

AutoKeras is built on top of Python’s Keras deep learning library to produce automated deep learning models. As an open-source library, it provides a tool anyone can afford and use, especially for working with unstructured data like images and text. It uses Neural Architecture Search to find the best neural architecture for a given problem. At the end of the training and evaluation process, you can import the best performing model as a Keras model to deploy in your system environment as needed. 

Takeaway

As a branch of artificial intelligence becoming increasingly popular in the technology world, AutoML brings machine learning closer to even those without prior technical knowledge. If you aren’t already using machine learning to process data flowing through your business to draw valuable insights because you don’t have the technical expertise, AutoML gives you the perfect alternative to start enjoying the real power hidden in data-driven insights. 

 

Previous Post
TeamDecision – SNS for Confluence
Next Post
Agile Estimation Techniques
Menu