Multiple linear regression is a useful way to quantify the relationship between two or more predictor variables and a response variable.
Typically when we perform multiple linear regression, the resulting regression coefficients are unstandardized, meaning they use the raw data to find the line of best fit.
However, when the predictor variables are measured on drastically different scales it can be useful to perform multiple linear regression using standardized data, which results in standardized coefficients.
To help you wrap your head around this idea, let’s walk through a simple example.
Example: Standardized vs. Unstandardized Regression Coefficients
Suppose we have the following dataset that contains information about the age, square footage, and selling price of 12 houses:
Suppose we then perform multiple linear regression, using age and square footage as the predictor variables and price as the response variable. Here is the regression output:
The regression coefficients in this table are unstandardized, meaning they used the raw data to fit this regression model. Upon first glance, it appears that age has a much larger effect on house price since it’s coefficient in the regression table is -409.833 compared to just 100.866 for the predictor variable square footage.
However, the standard error is much larger for age compared to square footage, which is why the corresponding p-value is actually large for age (p=0.520) and small for square footage (p=0.000).
The reason for the extreme differences in regression coefficients is because of the extreme differences in scales for the two variables:
- The values for age range from 4 to 44.
- The values for square footage range from 1,200 to 2,800.
Suppose we instead standardize the original raw data by converting each original data value to a z-score:
If we then perform multiple linear regression using the standardized data, we’ll get the following regression output:
The regression coefficients in this table are standardized, meaning they used standardized data to fit this regression model. The way to interpret the coefficients in the table is as follows:
- A one standard deviation increase in age is associated with a 0.92 standard deviation decrease in house price, assuming square footage is held constant.
- A one standard deviation increase in square footage is associated with a 0.885 standard deviation increase in house price, assuming age is held constant.
Immediately we can see that square footage has a much larger effect on house price than age. Also note that the p-values for each predictor variable are the exact same as the previous regression model.
Related: How to Calculate Z-Scores in Excel
When to Use Standardized vs. Unstandardized Regression Coefficients
Standardized and unstandardized regression coefficients can both be useful depending on the situation. In particular:
Unstandardized regression coefficients are useful when you want to interpret the effect that a one unit change on a predictor variable has on a response variable. In the example above, we could use the unstandardized regression coefficients from the first regression to understand the exact relationship between the predictor variables and the response variable:
- A one unit increase in age was associated with an average $409 decrease in house price, assuming square footage was held constant. This coefficient turned out to not be statistically significant (p=0.520).
- A one unit increase in square footage was associated with an average $100 increase in house price, assuming age was held constant. This coefficient also turned out to be statistically significant (p=0.000).
Standardized regression coefficients are useful when you want to compare the effect that different predictor variables have on a response variable. Since each variable is standardized, you’re able to see which variable has the greatest effect on the response variable.
One downside of standardized regression coefficients is that they’re a bit harder to interpret. For example, it’s easier to understand the effect that a one unit increase in age has on house price compared to the effect that a one standard deviation increase has on house price.
How to Read and Interpret a Regression Table
How to Interpret Regression Coefficients
How to Perform Multiple Linear Regression in Excel