*To see this with sample code, check out my Kaggle notebook.
Like simple and multiple regression, Polynomial regression is a “supervised” “regression” algorithm.
data:image/s3,"s3://crabby-images/bb3d4/bb3d4f8b2f2ee32a9fe4248dcbb972af5d885d4a" alt=""
Supervised meaning we use labeled data to train the model.
data:image/s3,"s3://crabby-images/74499/74499ffba339b618d5ef7a703e3cd9fe37343875" alt=""
Regression meaning we predict a numerical value, instead of a “class”.
data:image/s3,"s3://crabby-images/61244/6124446abd72333ff051b3c27fa7b07510ffbde1" alt=""
However, polynomial regression differs in that it allows us fit curved lines/planes to our data.
data:image/s3,"s3://crabby-images/9c468/9c46856fe91f539505d669ec08ae9e8219183369" alt=""
This is accomplished by “exponentiating” our variable by taking it to powers greater than 1.
data:image/s3,"s3://crabby-images/03471/03471f761025db83c89fb8686a5eb57501ecff04" alt=""
This makes it a fantastic modeling tool for “exponential” datasets.
data:image/s3,"s3://crabby-images/489ff/489ff41da61ba68d77a0081e69b855a977dc20b0" alt=""
Polynomial regression still uses “least squares“, but it is an “overdetermined system” meaning that there are more equations than unknowns.
data:image/s3,"s3://crabby-images/d766c/d766c663a109b2d34ae4124ef0e43b497f896cc2" alt=""
Therefore, we must rely on matrix multiplication to solve for the the best fit.
data:image/s3,"s3://crabby-images/9a42f/9a42f4f39951eb613b7746759c091a8c841ec937" alt=""
In practice, this is handled by our statistical software, we just need to select the “degree” of the polynomial.
data:image/s3,"s3://crabby-images/74910/74910962136dc5845ec7034c2aa424e79ac2767f" alt=""
Our selection should seek to fit the current data well, and generalize to new data.
data:image/s3,"s3://crabby-images/97eab/97eab135b24f286a9d2ac4264bd3c9d8eadc0ff5" alt=""
This challenge is known as the “bias variance tradeoff”.
data:image/s3,"s3://crabby-images/10cd6/10cd6acd749c3e7fa3f2ded28304e6179779d718" alt=""
As we increase the degree, the bias decreases, but the variance increases. We want to stop where these two factors are minimized.
data:image/s3,"s3://crabby-images/5feda/5feda7478071d26f2731669c4b50490493f6f62e" alt=""
Overall, Polynomial Regression is an essential tool in your Machine Learning arsenal. However, it is very sensitive to outliers, and we must take care when selecting the degree to avoid overfitting.