*To see this with sample code, check out my Kaggle notebook.
Like Simple Linear Regression, multiple regression is a “supervised” “regression” algorithm.
data:image/s3,"s3://crabby-images/7817e/7817e84f0042db925885902f73ea4fc560ce3223" alt=""
Supervised meaning we use labeled data to train the model.
data:image/s3,"s3://crabby-images/15990/15990a06dd80777553d12e3059ec324cd094435e" alt=""
Regression meaning we predict a numerical value, instead of a “class”.
data:image/s3,"s3://crabby-images/a6bdd/a6bdd6c52a7b21850bf3afac52cbb894f582db15" alt=""
However, we now have multiple independent variables that impact the dependent variable.
data:image/s3,"s3://crabby-images/a8367/a8367bbecd3fd2766830de21857e90a2ad5b5618" alt=""
data:image/s3,"s3://crabby-images/e0a42/e0a42441461b9f372de7261231b6c6b2f4090481" alt=""
Least Squares is still used, but instead of fitting a line to the data, we fit a (n-1) dimensional plane. (Ex: 3D data -> 2D plane.)
data:image/s3,"s3://crabby-images/0d018/0d018f30a0623a697028aa6c7679e6ee783a4998" alt=""
Before applying Multiple Regression, we must test 4 specific assumptions, which can be done with any statistical software.
data:image/s3,"s3://crabby-images/2de10/2de101ba456b55d9a730001bde0d52a6c0f91b0e" alt=""
Next, we must select our independent variables; maximizing the accuracy while minimizing the number of variables used. This balance is called “parsimony”.
data:image/s3,"s3://crabby-images/ec1c7/ec1c76213373c4bf556562d9ffdb6341428789df" alt=""
There are multiple approaches to making this decision, such as “backward elimination” and “forward selection”.
data:image/s3,"s3://crabby-images/5b0bd/5b0bdb2fb2426b0bc9b2bb4e88ededecf277794e" alt=""
data:image/s3,"s3://crabby-images/45d4f/45d4f0a33bf8ef57849cac99c767531925147aae" alt=""
Once we’ve completed the regression, we evaluate the fit with the “R^2 score” which tells us how closely our prediction matched the data.
data:image/s3,"s3://crabby-images/c848f/c848f7c208eb6ebaa739e098f5d285a55aa43b3a" alt=""
We use an “F-test” to find the “p-value” which tells us the probability that our observations are due only to chance. Typically, the findings are “significant” if p < 5%.
data:image/s3,"s3://crabby-images/c27fc/c27fc13c5497d58a951324fb5c5c1c6c0083a691" alt=""
Overall, Multiple Regression is very applicable to real world problems, however, practitioners must test the assumptions and apply parsimony for valid conclusions to be made.