How to Create a Correlation Matrix in SAS (With Example)


A correlation matrix is a square table that shows the correlation coefficients between variables in a dataset.

It offers a quick way to understand the strength of the linear relationships that exist between variables in a dataset.

You can use the PROC CORR statement in SAS to create a correlation matrix for a given dataset:

/*create correlation matrix using all numeric variables in my_data*/
proc corr data=my_data;
run;

By default, this will create a matrix that displays the correlation coefficients between all numeric variables in the dataset.

To only include specific variables in the correlation matrix, you can use the VAR statement:

/*create correlation matrix using only var1, var2 and var3 in my_data*/
proc corr data=my_data;
    var var1, var2, var3;
run;

The following example shows how to create a correlation matrix in SAS.

Example: Creating a Correlation Matrix in SAS

Suppose we have the following dataset in SAS that contains information about various basketball players:

/*create dataset*/
data my_data;
    input team $ assists rebounds points;
    datalines;
A 4 12 22
A 5 14 24
A 5 13 26
A 6 7 26
B 7 8 29
B 8 8 32
B 8 9 20
B 10 13 14
;
run;

/*view dataset*/
proc print data=my_data; 

We can use the PROC CORR statement to create a correlation matrix that includes each numeric variable in the dataset by default:

/*create correlation matrix using all numeric variables in my_data*/
proc corr data=my_data;
run;

 

correlation matrix in SAS

The output displays summary statistics of the numeric variables in the first table along with a correlation matrix.

Note that the “team” variable was not included in the correlation matrix because it was not a numeric variable.

Here is how to interpret the values in the correlation matrix:

(1) The Pearson correlation coefficient (r) between assists and rebounds is -0.24486. The corresponding p-value is 0.5589.

Since r is less than zero, this tells us that there is a negative linear association between these two variables. However, the p-value is not less than .05 so this correlation is not statistically significant.

(2) The Pearson correlation coefficient (r) between assists and points is -0.32957. The corresponding p-value is 0.4253.

There is a negative linear association between these two variables but it is not statistically significant.

(3) The Pearson correlation coefficient (r) between rebounds and points is -0.52209. The corresponding p-value is 0.1844.

There is a negative linear association between these two variables but it is not statistically significant.

Note that we could also use the VAR statement to only include specific numeric variables in the correlation matrix:

/*create correlation matrix using only assists and rebounds variables*/
proc corr data=my_data;
    var assists rebounds;
run;

Notice that only the assists and rebounds variables were included in this correlation matrix.

Additional Resources

The following tutorials explain how to perform other common tasks in SAS:

How to Create a Scatter Plot Matrix in SAS
How to Create Pivot Tables in SAS
How to Calculate Variance Inflation Factor (VIF) in SAS

Leave a Reply

Your email address will not be published. Required fields are marked *