Scroll to read more

*To see this post with a code example, check out my Kaggle Notebook.

Lasso and Ridge regression, also known as L1 & L2 respectively, are “regularization” techniques.

The goal of regularization is to improve the overall fit by increasing “bias” to reduce “variance”, by adding a penalty that scales with model complexity.

Applying this to linear regression, we start with a line through our data.

We calculate the residuals per usual.

Then the penalty is calculated.  For Lasso, the penalty scales with the absolute value of the slope, and for Ridge it scales with the slope squared.

The penalty is added to our residual, and then the algorithm proceeds via the least-squares method.

The result is a best-fit line with a smaller slope, that will hopefully fit our test data better.

This is particularly useful when working with a small amount of training data.

This is also powerful with higher dimensional data, as the penalty is calculated using the coefficients of all predictive variables.

With Ridge Regression, the influence of unnecessary variables minimized, and with Lasso Regression their coefficients can actually drop to zero, removing them from the model all together.

Overall, both regularization techniques help reduce overfitting, especially with small datasets or those with many variables.