How to Calculate the P-Value of an F-Statistic in R

How to calculate the p-value of an f-statistic in R

An F-test produces an F-statistic. To find the p-value associated with an F-statistic in R, you can use the following command:

pf(fstat, df1, df2, lower.tail = FALSE)

  • fstat – the value of the f-statistic
  • df1 – degrees of freedom 1
  • df2 – degrees of freedom 2
  • lower.tail – whether or not to return the probability associated with the lower tail of the F distribution. This is TRUE by default.

For example, here is how to find the p-value associated with an F-statistic of 5, with degrees of freedom 1 = 3 and degrees of freedom 2 = 14:

pf(5, 3, 14, lower.tail = FALSE)

#[1] 0.01457807

One of the most common uses of an F-test is for testing the overall significance of a regression model. In the following example, we show how to calculate the p-value of the F-statistic for a regression model.

Example: Calculating p-value from F-statistic

Suppose we have a dataset that shows the total number of hours studied, total prep exams taken, and final exam score received for 12 different students:

#create dataset
data <- data.frame(study_hours = c(3, 7, 16, 14, 12, 7, 4, 19, 4, 8, 8, 3),
                   prep_exams = c(2, 6, 5, 2, 7, 4, 4, 2, 8, 4, 1, 3),
                   final_score = c(76, 88, 96, 90, 98, 80, 86, 89, 68, 75, 72, 76))

#view first six rows of dataset
head(data)

#  study_hours prep_exams final_score
#1           3          2          76
#2           7          6          88
#3          16          5          96
#4          14          2          90
#5          12          7          98
#6           7          4          80

Next, we can fit a linear regression model to this data using study hours and prep exams as the predictor variables and final score as the response variable. Then, we can view the output of the model:

#fit regression model
model <- lm(final_score ~ study_hours + prep_exams, data = data)

#view output of the model
summary(model)

#Call:
#lm(formula = final_score ~ study_hours + prep_exams, data = data)
#
#Residuals:
#    Min      1Q  Median      3Q     Max 
#-13.128  -5.319   2.168   3.458   9.341 
#
#Coefficients:
#            Estimate Std. Error t value Pr(>|t|)    
#(Intercept)   66.990      6.211  10.785  1.9e-06 ***
#study_hours    1.300      0.417   3.117   0.0124 *  
#prep_exams     1.117      1.025   1.090   0.3041    
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
#Residual standard error: 7.327 on 9 degrees of freedom
#Multiple R-squared:  0.5308,	Adjusted R-squared:  0.4265 
#F-statistic: 5.091 on 2 and 9 DF,  p-value: 0.0332

On the very last line of the output we can see that the F-statistic for the overall regression model is 5.091. This F-statistic has 2 degrees of freedom for the numerator and 9 degrees of freedom for the denominator. R automatically calculates that the p-value for this F-statistic is 0.0332.

In order to calculate this equivalent p-value ourselves, we could use the following code:

pf(5.091, 2, 9, lower.tail = FALSE)

#[1] 0.0331947

Notice that we get the same answer (but with more decimals displayed) as the linear regression output above.

Leave a Reply

Your email address will not be published. Required fields are marked *