How to Calculate Root Mean Square Error (RMSE) in Excel


In statistics, regression analysis is a technique we use to understand the relationship between a predictor variable, x, and a response variable, y. 

When we conduct regression analysis, we end up with a model that tells us the predicted value for the response variable based on the value of the predictor variable.

One way to assess how “good” our model fits a given dataset is to calculate the root mean square error, which is a metric that tells us how far apart our predicted values are from our observed values, on average.

The formula to find the root mean square error, more commonly referred to as RMSE, is as follows:

RMSE = √[ Σ(Pi – Oi)2 / n ]

where:

  • Σ is a fancy symbol that means “sum”
  • Pi is the predicted value for the ith observation in the dataset
  • Oi is the observed value for the ith observation in the dataset
  • n is the sample size

Nerd notes:

 

  • The root mean square error can be calculated for any type of model that produces predicted values, which can then be compared to the observed values of a dataset.
  • The root mean square error is also sometimes called the root mean square deviation, which is often abbreviated as RMSD.

Next, let’s look at an example of how to calculate root mean square error in Excel.

How to Calculate Root Mean Square Error in Excel

There is no built-in function to calculate RMSE in Excel, but we can calculate it fairly easily with a single formula. We’ll show how to calculate RMSE for two different scenarios.

Scenario 1

In one scenario, you might have one column that contains the predicted values of your model and another column that contains the observed values. The image below shows an example of this scenario:

Example of calculating RMSE in Excel for observed and predicted values

If this is the case, then you can calculate the RMSE by typing the following formula into any cell, and then clicking CTRL+SHIFT+ENTER:

=SQRT(SUMSQ(A2:A21-B2:B21) / COUNTA(A2:A21))

Example of calculating root mean square error in Excel

This tells us that the root mean square error is 2.6646.

RMSE calculation in Excel

The formula might look a bit tricky, but it makes sense once you break it down:

=SQRT(SUMSQ(A2:A21-B2:B21) / COUNTA(A2:A21))

  • First, we calculate the sum of the squared differences between the predicted and observed values using the SUMSQ() function.
  • Next, we divide by the sample size of the dataset using COUNTA(), which counts the number of cells in a range that are not empty.
  • Lastly, we take the square root of the whole calculation using the SQRT() function.

Scenario 2

In another scenario, you may have already calculated the differences between the predicted and observed values. In this case, you will only have one column that displays the differences.

The image below shows an example of this scenario. The predicted values are displayed in column A, the observed values in column B, and the difference between the predicted and observed values in column D:

Root mean square error example in Excel

If this is the case, then you can calculate the RMSE by typing the following formula into any cell, and then clicking CTRL+SHIFT+ENTER:

=SQRT(SUMSQ(D2:D21) / COUNTA(D2:D21))

RMSE in Excel

This tells us that the root mean square error is 2.6646, which matches the result that we got in the first scenario. This confirms that these two approaches to calculating RMSE are equivalent.

Root mean square error in Excel

The formula we used in this scenario is only slightly different than the one we used in the previous scenario:

=SQRT(SUMSQ(D2:D21) / COUNTA(D2:D21))

  • Since we already calculated the differences between the predicted and observed values in column D, we can calculate the sum of the squared differences by using the SUMSQ() function with just the values in column D.
  • Next, we divide by the sample size of the dataset using COUNTA(), which counts the number of cells in a range that are not empty.
  • Lastly, we take the square root of the whole calculation using the SQRT() function.

How to Interpret RMSE

As mentioned earlier, RMSE is a useful way to see how well a regression model (or any model that produces predicted values) is able to “fit” a dataset.

The larger the RMSE, the larger the difference between the predicted and observed values, which means the worse the regression model fits the data. Conversely, the smaller the RMSE, the better a model is able to fit the data.

It can be particularly useful to compare the RMSE of two different models with each other to see which model fits the data better.

For more tutorials in Excel, be sure to check out our Excel Guides Page, which lists every Excel tutorial on Statology.

Leave a Reply

Your email address will not be published. Required fields are marked *