A correlation matrix is a square table that shows the correlation coefficients between variables in a dataset.

It offers a quick way to understand the strength of the linear relationships that exist between variables in a dataset.

You can use the **PROC CORR** statement in SAS to create a correlation matrix for a given dataset:

**/*create correlation matrix using all numeric variables in my_data*/
proc corr data=my_data;
run;**

By default, this will create a matrix that displays the correlation coefficients between all numeric variables in the dataset.

To only include specific variables in the correlation matrix, you can use the **VAR** statement:

**/*create correlation matrix using only var1, var2 and var3 in my_data*/
proc corr data=my_data;
var var1, var2, var3;
run;**

The following example shows how to create a correlation matrix in SAS.

**Example: Creating a Correlation Matrix in SAS**

Suppose we have the following dataset in SAS that contains information about various basketball players:

**/*create dataset*/
data my_data;
input team $ assists rebounds points;
datalines;
A 4 12 22
A 5 14 24
A 5 13 26
A 6 7 26
B 7 8 29
B 8 8 32
B 8 9 20
B 10 13 14
;
run;
/*view dataset*/
proc print data=my_data; **

We can use the **PROC CORR** statement to create a correlation matrix that includes each numeric variable in the dataset by default:

**/*create correlation matrix using all numeric variables in my_data*/
proc corr data=my_data;
run;**

The output displays summary statistics of the numeric variables in the first table along with a correlation matrix.

Note that the “team” variable was not included in the correlation matrix because it was not a numeric variable.

Here is how to interpret the values in the correlation matrix:

**(1)** The Pearson correlation coefficient (r) between **assists** and **rebounds** is **-0.24486**. The corresponding p-value is **0.5589**.

Since r is less than zero, this tells us that there is a negative linear association between these two variables. However, the p-value is not less than .05 so this correlation is not statistically significant.

**(2)** The Pearson correlation coefficient (r) between **assists** and **points **is **-0.32957**. The corresponding p-value is **0.4253**.

There is a negative linear association between these two variables but it is not statistically significant.

**(3)** The Pearson correlation coefficient (r) between **rebounds **and **points **is **-0.52209**. The corresponding p-value is **0.1844**.

There is a negative linear association between these two variables but it is not statistically significant.

Note that we could also use the **VAR** statement to only include specific numeric variables in the correlation matrix:

**/*create correlation matrix using only assists and rebounds variables*/
proc corr data=my_data;
var assists rebounds;
run;**

Notice that only the **assists** and **rebounds** variables were included in this correlation matrix.

**Additional Resources**

The following tutorials explain how to perform other common tasks in SAS:

How to Create a Scatter Plot Matrix in SAS

How to Create Pivot Tables in SAS

How to Calculate Variance Inflation Factor (VIF) in SAS