A partial F-test is used to determine whether or not there is a statistically significant difference between a regression model and some nested version of the same model.
A nested model is simply one that contains a subset of the predictor variables in the overall regression model.
For example, suppose we have the following regression model with four predictor variables:
Y = β0 + β1x1 + β2x2 + β3x3 + β4x4 + ε
One example of a nested model would be the following model with only two of the original predictor variables:
Y = β0 + β1x1 + β2x2 + ε
To determine if these two models are significantly different, we can perform a partial F-test.
Partial F-Test: The Basics
A partial F-test calculates the following F test-statistic:
F = ((RSSreduced – RSSfull)/p) / (RSSfull/n-k)
- RSSreduced: The residual sum of squares of the reduced (i.e. “nested”) model.
- RSSfull: The residual sum of squares of the full model.
- p: The number of predictors removed from the full model.
- n: The total observations in the dataset.
- k: The number of coefficients (including the intercept) in the full model.
Note that the residual sum of squares will always be smaller for the full model since adding predictors will always lead to some reduction in error.
Thus, a partial F-test essentially tests whether the group of predictors that you removed from the full model are actually useful and need to be included in the full model.
This test uses the following null and alternative hypotheses:
H0: All coefficients removed from the full model are zero.
HA: At least one of the coefficients removed from the full model is non-zero.
If the p-value corresponding to the F test-statistic is below a certain significance level (e.g. 0.05), then we can reject the null hypothesis and conclude that at least one of the coefficients removed from the full model is significant.
Partial F-Test: An Example
In practice, we use the following steps to perform a partial F-test:
1. Fit the full regression model and calculate RSSfull.
2. Fit the nested regression model and calculate RSSreduced.
3. Perform an ANOVA to compare the full and reduced model, which will produce the F test-statistic needed to compare the models.
For example, the following code shows how to fit the following two regression models in R using data from the built-in mtcars dataset:
Full model: mpg = β0 + β1disp + β2carb + β3hp + β4cyl
Reduced model: mpg = β0 + β1disp + β2carb
#fit full model model_full <- lm(mpg ~ disp + carb + hp + cyl, data = mtcars) #fit reduced model model_reduced <- lm(mpg ~ disp + carb, data = mtcars) #perform ANOVA to test for differences in models anova(model_reduced, model_full) Analysis of Variance Table Model 1: mpg ~ disp + carb Model 2: mpg ~ disp + carb + hp + cyl Res.Df RSS Df Sum of Sq F Pr(>F) 1 29 254.82 2 27 238.71 2 16.113 0.9113 0.414
From the output we can see that the F test-statistic from the ANOVA is 0.9113 and the corresponding p-value is 0.414.
Since this p-value is not less than .05, we will fail to reject the null hypothesis. This means we don’t have sufficient evidence to say that either of the predictor variables hp or cyl are statistically significant.
In other words, adding hp and cyl to the regression model do not significantly improve the fit of the model.