One way to assess how well a regression model fits a dataset is to calculate the **root mean square error**, which is a metric that tells us the average distance between the predicted values from the model and the actual values in the dataset.

The lower the RMSE, the better a given model is able to “fit” a dataset.

The formula to find the root mean square error, often abbreviated **RMSE**, is as follows:

**RMSE = **√Σ(P_{i} – O_{i})^{2} / n

where:

- Σ is a symbol that represents “sum”
- P
_{i}is the predicted value for the i^{th}observation in the dataset - O
_{i}is the observed value for the i^{th}observation in the dataset - n is the sample size

The following step-by-step example shows how to calculate the RMSE for a simple linear regression model in SAS.

**Step 1: Create the Data**

For this example, we’ll create a dataset that contains the total hours studied and final exam score for 15 students.

We’ll to fit a simple linear regression model using *hours* as the predictor variable and *score* as the response variable.

The following code shows how to create this dataset in SAS:

/*create dataset*/ data exam_data; input hours score; datalines; 1 64 2 66 4 76 5 73 5 74 6 81 6 83 7 82 8 80 10 88 11 84 11 82 12 91 12 93 14 89 ; run; /*view dataset*/ proc print data=exam_data;

**Step 2: Fit the Simple Linear Regression Model**

Next, we’ll use **proc reg** to fit the simple linear regression model:

/*fit simple linear regression model*/ proc reg data=exam_data; model score = hours; run;

Notice that the RMSE in the output is **3.64093**.

**Step 3: Extract RMSE from Regression Model**

If you only want to view the RMSE of this model and none of the other output results, you can use the following code:

/*fit simple linear regression model*/ proc reg data=exam_data outest=outest noprint; model score = hours / rmse; run; quit; /*print RMSE of model*/ proc print data=outest; var _RMSE_; run;

Notice that only the RMSE value of **3.64093 **is shown in the output.

**Note**: The argument **noprint** in** proc reg** tells SAS not to print the entire output of regression results as it did in the previous step.

