Supervised learning is the most widely used branch of machine learning.
It requires that our training data has “labels”.
A label is the output value we are trying to predict, often called the “y-variable”.
When the label is a numeric value, this is called regression.
Regression algorithms work by building a formula to calculate the output based on the input.
When the label is a “category” that the example belongs to, this is called classification.
Classification algorithms build a model that predicts the probabilities of each category, given a training example.
Some algorithms, including those used in ensemble methods and deep learning, can perform classification and regression.
Ensemble methods work by building many models and allowing them to vote on the answer.
Deep learning utilizes neural networks, which simulate how biological neurons function.
There are also some types of specialized neural networks which excel at specific tasks.
Deep learning is very widely used today, but it has some downsides. Specifically, large models require massive amounts of training data and significant computational resources.
“Big-data” and “cloud computing” have mitigated part of this problem. However, obtaining large quantities of high-quality, labeled data, remains one of the biggest barriers in machine learning today.