*To see this with example code, check out my Kaggle Notebook.
Decision Trees are a very flexible algorithm and can be used in supervised or unsupervised contexts, for both classification and regression.
This post is about the “supervised” “regression” version.
Supervised meaning we used labeled data to train the model.
Regression meaning we predict a numerical value, instead of a “class”.
Decision trees split our data into subcategories which we use to predict some output variable.
They shine when working with datasets that have multiple “predictor” variables, as these are notoriously hard to visualize.
To create these separations, the algorithm tests various splits, attempting to maximize “information gain” by reducing “entropy”.
Once the algorithm determines the first split, this becomes the “root node”.
Next, it evaluates the resulting subsets of data, and creates another split based on information gain.
It continues with this process until hitting some predefined stopping point, such as a maximum depth or minimum sample size per subset.
With the splitting complete, an average value is calculated for each “leaf node”.
Now we can pass in new examples and predict their output.
Overall, Decision Trees are highly flexible and intuitive, but not the most accurate on their own. However, they are the foundation of other incredibly powerful algorithms.