# How to Calculate RMSE in SAS

One way to assess how well a regression model fits a dataset is to calculate the root mean square error, which is a metric that tells us the average distance between the predicted values from the model and the actual values in the dataset.

The lower the RMSE, the better a given model is able to “fit” a dataset.

The formula to find the root mean square error, often abbreviated RMSE, is as follows:

RMSE = Σ(Pi – Oi)2 / n

where:

• Σ is a symbol that represents “sum”
• Pi is the predicted value for the ith observation in the dataset
• Oi is the observed value for the ith observation in the dataset
• n is the sample size

The following step-by-step example shows how to calculate the RMSE for a simple linear regression model in SAS.

## Step 1: Create the Data

For this example, we’ll create a dataset that contains the total hours studied and final exam score for 15 students.

We’ll to fit a simple linear regression model using hours as the predictor variable and score as the response variable.

The following code shows how to create this dataset in SAS:

```/*create dataset*/
data exam_data;
input hours score;
datalines;
1 64
2 66
4 76
5 73
5 74
6 81
6 83
7 82
8 80
10 88
11 84
11 82
12 91
12 93
14 89
;
run;

/*view dataset*/
proc print data=exam_data;
```

## Step 2: Fit the Simple Linear Regression Model

Next, we’ll use proc reg to fit the simple linear regression model:

```/*fit simple linear regression model*/
proc reg data=exam_data;
model score = hours;
run;```

Notice that the RMSE in the output is 3.64093.

## Step 3: Extract RMSE from Regression Model

If you only want to view the RMSE of this model and none of the other output results, you can use the following code:

```/*fit simple linear regression model*/
proc reg data=exam_data outest=outest noprint;
model score = hours / rmse;
run;
quit;

/*print RMSE of model*/
proc print data=outest;
var _RMSE_;
run;```

Notice that only the RMSE value of 3.64093 is shown in the output.

Note: The argument noprint in proc reg tells SAS not to print the entire output of regression results as it did in the previous step.