*To see this post with a code example, check out my Kaggle Notebook.
Lasso and Ridge regression, also known as L1 & L2 respectively, are “regularization” techniques.
data:image/s3,"s3://crabby-images/a6068/a6068ccc8e131d839b508f8666995fe18c74e6e3" alt=""
The goal of regularization is to improve the overall fit by increasing “bias” to reduce “variance”, by adding a penalty that scales with model complexity.
data:image/s3,"s3://crabby-images/43ec7/43ec7e2abdbdc2a2496b56c1b0d507e560a122d9" alt=""
data:image/s3,"s3://crabby-images/0f29a/0f29aeab77ba8dfec58b350045a4567b2a45688a" alt=""
Applying this to linear regression, we start with a line through our data.
data:image/s3,"s3://crabby-images/e8ea0/e8ea0e11d02bcf0903c4cec92dc1abfd790326ba" alt=""
We calculate the residuals per usual.
data:image/s3,"s3://crabby-images/17dfb/17dfb1303375301b41708a61a198542a789be420" alt=""
Then the penalty is calculated. For Lasso, the penalty scales with the absolute value of the slope, and for Ridge it scales with the slope squared.
data:image/s3,"s3://crabby-images/ccc07/ccc07621d35f117bbe3f1b86836b0a25802b9a2f" alt=""
The penalty is added to our residual, and then the algorithm proceeds via the least-squares method.
data:image/s3,"s3://crabby-images/03de2/03de29f41bb6c6eb26cd0e5da435f34676da6042" alt=""
The result is a best-fit line with a smaller slope, that will hopefully fit our test data better.
data:image/s3,"s3://crabby-images/d63f6/d63f6db76e9732b2566a87ddac4033a92a6fceec" alt=""
This is particularly useful when working with a small amount of training data.
data:image/s3,"s3://crabby-images/b21d6/b21d6210435171915d61309ceaa5dee75af62fb6" alt=""
This is also powerful with higher dimensional data, as the penalty is calculated using the coefficients of all predictive variables.
data:image/s3,"s3://crabby-images/7787f/7787fc86ca48d57abf52fce3885fafda5f92c37e" alt=""
With Ridge Regression, the influence of unnecessary variables minimized, and with Lasso Regression their coefficients can actually drop to zero, removing them from the model all together.
data:image/s3,"s3://crabby-images/d5716/d5716eebe21d99c3017b7a4d7e4577d236539751" alt=""
Overall, both regularization techniques help reduce overfitting, especially with small datasets or those with many variables.