How to Calculate RMSE in SAS


One way to assess how well a regression model fits a dataset is to calculate the root mean square error, which is a metric that tells us the average distance between the predicted values from the model and the actual values in the dataset.

The lower the RMSE, the better a given model is able to “fit” a dataset.

The formula to find the root mean square error, often abbreviated RMSE, is as follows:

RMSE = Σ(Pi – Oi)2 / n

where:

  • Σ is a symbol that represents “sum”
  • Pi is the predicted value for the ith observation in the dataset
  • Oi is the observed value for the ith observation in the dataset
  • n is the sample size

The following step-by-step example shows how to calculate the RMSE for a simple linear regression model in SAS.

Step 1: Create the Data

For this example, we’ll create a dataset that contains the total hours studied and final exam score for 15 students.

We’ll to fit a simple linear regression model using hours as the predictor variable and score as the response variable.

The following code shows how to create this dataset in SAS:

/*create dataset*/
data exam_data;
    input hours score;
    datalines;
1 64
2 66
4 76
5 73
5 74
6 81
6 83
7 82
8 80
10 88
11 84
11 82
12 91
12 93
14 89
;
run;

/*view dataset*/
proc print data=exam_data;

Step 2: Fit the Simple Linear Regression Model

Next, we’ll use proc reg to fit the simple linear regression model:

/*fit simple linear regression model*/
proc reg data=exam_data;
    model score = hours;
run;

simple linear regression output in SAS

Notice that the RMSE in the output is 3.64093.

Step 3: Extract RMSE from Regression Model

If you only want to view the RMSE of this model and none of the other output results, you can use the following code:

/*fit simple linear regression model*/
proc reg data=exam_data outest=outest noprint;
    model score = hours / rmse;
run;
quit;

/*print RMSE of model*/
proc print data=outest;
    var _RMSE_;
run;

Calculate RMSE in SAS

Notice that only the RMSE value of 3.64093 is shown in the output.

Note: The argument noprint in proc reg tells SAS not to print the entire output of regression results as it did in the previous step.

Additional Resources

The following tutorials explain how to perform other common tasks in SAS:

How to Perform Simple Linear Regression in SAS
How to Perform Multiple Linear Regression in SAS
How to Perform Polynomial Regression in SAS
How to Perform Logistic Regression in SAS

Leave a Reply

Your email address will not be published. Required fields are marked *