**Cook’s distance** is used to identify influential observations in a regression model.

The formula for Cook’s distance is:

**D _{i}** = (r

_{i}

^{2}/ p*MSE) * (h

_{ii}/ (1-h

_{ii})

^{2})

where:

**r**_{i }is the i^{th}residual**p**is the number of coefficients in the regression model**MSE**is the mean squared error**h**_{ii}is the i^{th}leverage value

Essentially Cook’s distance measures how much all of the fitted values in the model change when the i^{th} observation is deleted.

The larger the value for Cook’s distance, the more influential a given observation.

A rule of thumb is that any observation with a Cook’s distance greater than 4/n (where *n* = total observations) is considered to be highly influential.

The following example shows how to calculate Cook’s distance for each observation in a regression model in SAS.

**Example: Calculating Cook’s Distance in SAS**

Suppose we have the following dataset in SAS:

**/*create dataset*/
data my_data;
input x y;
datalines;
8 41
12 42
12 39
13 37
14 35
16 39
17 45
22 46
24 39
26 49
29 55
30 57
;
run;
/*view dataset*/
proc print data=my_data;
**

We can use **PROC REG** to fit a simple linear regression model to this dataset and then use the **OUTPUT** statement along with the **COOKD** statement to calculate Cook’s distance for each observation in the regression model:

**/*fit simple linear regression model and calculate Cook's distance for each obs*/
proc reg data=my_data;
model y=x;
output out=cooksData cookd=cookd;
run;
/*print Cook's distance values for each observation*/
proc print data=cooksData;
**

The final table in the output displays the original dataset along with Cook’s distance for each observation:

For example, we can see:

- Cook’s distance for the first observation is
**0.36813**. - Cook’s distance for the second observation is
**0.06075**. - Cook’s distance for the third observation is
**0.00052**.

And so on.

The **PROC REG** procedure also produces several diagnostic plots in the output and the chart for Cook’s distance can be seen in this output:

The x-axis shows the observation number and the y-axis shows Cook’s distance for each observation.

Note that a cutoff line is placed at 4/n (in this case n = 12, thus the cutoff is at 0.33) and we can see that three observations in the dataset are greater than this line.

This indicates that these observations could be highly influential to the regression model and should perhaps be examined more closely before interpreting the output of the model.

**Additional Resources**

The following tutorials explain how to perform other common tasks in SAS:

How to Create a Residual Plot in SAS

How to Create Histograms in SAS

How to Create Scatter Plots in SAS

How to Identify Outliers in SAS