The **log-likelihood value** of a regression model is a way to measure the goodness of fit for a model. The higher the value of the log-likelihood, the better a model fits a dataset.

The log-likelihood value for a given model can range from negative infinity to positive infinity. The actual log-likelihood value for a given model is mostly meaningless, but **it’s useful for comparing two or more models**.

In practice, we often fit several regression models to a dataset and choose the model with the highest log-likelihood value as the model that fits the data best.

The following example shows how to interpret log-likelihood values for different regression models in practice.

**Example: Interpreting Log-Likelihood Values**

Suppose we have the following dataset that shows the number of bedrooms, number of bathrooms, and selling price of 20 different houses in a particular neighborhood:

Suppose we’d like to fit the following two regression models and determine which one offers a better fit to the data:

**Model 1**: Price = β_{0} + β_{1}(number of bedrooms)

**Model 2**: Price = β_{0} + β_{1}(number of bathrooms)

The following code shows how to fit each regression model and calculate the log-likelihood value of each model in R:

#define data df <- data.frame(beds=c(1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 6), baths=c(2, 1, 4, 3, 2, 2, 3, 5, 4, 3, 4, 4, 3, 4, 2, 4, 3, 5, 6, 7), price=c(120, 133, 139, 185, 148, 160, 192, 205, 244, 213, 236, 280, 275, 273, 312, 311, 304, 415, 396, 488)) #fit models model1 <- lm(price~beds, data=df) model2 <- lm(price~baths, data=df) #calculate log-likelihood value of each model logLik(model1) 'log Lik.' -91.04219 (df=3) logLik(model2) 'log Lik.' -111.7511 (df=3)

The first model has a higher log-likelihood value (**-91.04**) than the second model (**-111.75**), which means the first model offers a better fit to the data.

**Cautions on Using Log-Likelihood Values**

When calculating log-likelihood values, it’s important to note that adding more predictor variables to a model will almost always increase the log-likelihood value even if the additional predictor variables aren’t statistically significant.

This means you should only compare the log-likelihood values between two regression models if each model has the same number of predictor variables.

To compare models with different numbers of predictor variables, you can perform a likelihood-ratio test to compare the goodness of fit of two nested regression models.

**Additional Resources**

The following tutorials explain how to perform other common tasks in R:

How to Use lm() Function to Fit Linear Models in R

How to Perform a Likelihood Ratio Test in R

Hi, I would like to ask you about your statement that log-likelihood value will almost always increase with the addition of predictor variables (even if the predictor variables are not significant). So my question is, what is your source information? Please attach your source of information as I really need it for my final project. Furthermore, if you are willing, may I ask your email address? Or my email address is santosamuel97@gmail.com (again, if you are willing to be asked through email).

Any help would be greatly appreciated. Thankyou in advance.

Or you can use Bayesian information criterion, which is the same thing you described but it takes into account the number of predictor variables (parameters)

BIC = -2 * LL + log(N) * k

Where log() has the base-e called the natural logarithm, LL is the log-likelihood of the model, N is the number of examples in the training dataset, and k is the number of parameters in the model.

The score as defined above is minimized, e.g. the model with the lowest BIC is selected.

Because the likelihood of any given data point is at most 1, the log-likelihood of any given data point is at most 0, so the log-likelihood can only range from negative infinity to 0.

sick thanks zach.