Feature Scaling is the data preprocessing step of converting numeric input columns into a common scale.
This is critical because many algorithms’ learning is skewed by the scale of the data, so they place significantly more weight on the biggest numbers and essentially ignore the small ones, regardless of their true importance.
There are two common approaches to feature scaling: Standardization & Normalization.
For standardization, we compare each value to the mean of the column and then divide by the column’s standard deviation.
This smooths the distribution of the data, making it a much better choice when the data contains important outliers.
For normalization, we just squish the values to fit between 0 and 1.
Outliers will significantly impact the distribution, so this approach is not recommended when they are present.
When using this in the training process, an important nuance arises around the train/test split.
Because we won’t know the distribution information of new incoming data, we must always scale the validation & tests sets by the distribution of the training set in order to properly simulate how our model will perform in the real world.