In statistics, the term y hat (written as ŷ) refers to the estimated value of a response variable in a linear regression model.
We typically write an estimated regression equation as follows:
ŷ = β0 + β1x
- ŷ: The estimated value of the response variable
- β0: The average value of the response variable when the predictor variable is zero
- β1: The average change in the response variable associated with a one unit increase in the predictor variable
For example, suppose we have the following dataset that shows the number of hours studied by six different students along with their final exam scores:
Suppose we use some statistical software (like R, Excel, Python, or even by hand) to fit the following regression model using hours studied as the predictor variable and exam score as the response variable:
Score = 66.615 + 5.0769*(Hours)
The way to interpret the regression coefficients in this model is as follows:
- The average exam score for a student who studies zero hours is 66.615.
- Exam score increases by an average of 5.0769 points for each additional hour studied.
We can use this regression equation to estimate the score of a given student based on the number of hours they studied.
For example, a student who studies for 3 hours is predicted to get a score of:
Score = 66.615 + 5.0769*(3) = 81.85
Why is Y Hat Used?
The “hat” symbol in statistics is used to denote any term that is “estimated.” For example, ŷ is used to denote an estimated response variable.
Typically when we fit linear regression models, we use a sample of data from a population since it’s more convenient and less time-consuming than collecting data for every possible observation in a population.
So, when we find a regression equation we’re only estimating the true relationship between a predictor variable and a response variable.
This is why we use the term ŷ in the regression equation instead of y.