The log-likelihood value of a regression model is a way to measure the goodness of fit for a model. The higher the value of the log-likelihood, the better a model fits a dataset.
The log-likelihood value for a given model can range from negative infinity to positive infinity. The actual log-likelihood value for a given model is mostly meaningless, but it’s useful for comparing two or more models.
In practice, we often fit several regression models to a dataset and choose the model with the highest log-likelihood value as the model that fits the data best.
The following example shows how to interpret log-likelihood values for different regression models in practice.
Example: Interpreting Log-Likelihood Values
Suppose we have the following dataset that shows the number of bedrooms, number of bathrooms, and selling price of 20 different houses in a particular neighborhood:
Suppose we’d like to fit the following two regression models and determine which one offers a better fit to the data:
Model 1: Price = β0 + β1(number of bedrooms)
Model 2: Price = β0 + β1(number of bathrooms)
The following code shows how to fit each regression model and calculate the log-likelihood value of each model in R:
#define data df <- data.frame(beds=c(1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 6), baths=c(2, 1, 4, 3, 2, 2, 3, 5, 4, 3, 4, 4, 3, 4, 2, 4, 3, 5, 6, 7), price=c(120, 133, 139, 185, 148, 160, 192, 205, 244, 213, 236, 280, 275, 273, 312, 311, 304, 415, 396, 488)) #fit models model1 <- lm(price~beds, data=df) model2 <- lm(price~baths, data=df) #calculate log-likelihood value of each model logLik(model1) 'log Lik.' -91.04219 (df=3) logLik(model2) 'log Lik.' -111.7511 (df=3)
The first model has a higher log-likelihood value (-91.04) than the second model (-111.75), which means the first model offers a better fit to the data.
Cautions on Using Log-Likelihood Values
When calculating log-likelihood values, it’s important to note that adding more predictor variables to a model will almost always increase the log-likelihood value even if the additional predictor variables aren’t statistically significant.
This means you should only compare the log-likelihood values between two regression models if each model has the same number of predictor variables.
To compare models with different numbers of predictor variables, you can perform a likelihood-ratio test to compare the goodness of fit of two nested regression models.