You can use the following methods to calculate correlation coefficients in R when one or more variables have missing values:

**Method 1: Calculate Correlation Coefficient with Missing Values Present**

**cor(x, y, use='complete.obs')**

**Method 2: Calculate Correlation Matrix with Missing Values Present**

**cor(df, use='pairwise.complete.obs')**

The following examples show how to use each method in practice.

**Example 1: Calculate Correlation Coefficient with Missing Values Present**

Suppose we attempt to use the **cor()** function to calculate the Pearson correlation coefficient between two variables when missing values are present:

**#create two variables
x <- c(70, 78, 90, 87, 84, NA, 91, 74, 83, 85)
y <- c(90, NA, 79, 86, 84, 83, 88, 92, 76, 75)
#attempt to calculate correlation coefficient between x and y
cor(x, y)
[1] NA**

The **cor()** function returns **NA** since we didn’t specify how to handle missing values.

To avoid this issue, we can use the argument **use=’complete.obs’** so that R knows to only use pairwise observations where both values are present:

**#create two variables
x <- c(70, 78, 90, 87, 84, NA, 91, 74, 83, 85)
y <- c(90, NA, 79, 86, 84, 83, 88, 92, 76, 75)
#calculate correlation coefficient between x and y
cor(x, y, use='complete.obs')
[1] -0.4888749
**

The correlation coefficient between the two variables turns out to be **-0.488749**.

Note that the **cor()** function only used pairwise combinations where both values were present when calculating the correlation coefficient.

**Example 2: Calculate Correlation Matrix with Missing Values Present**

Suppose we attempt to use the **cor()** function to create a correlation matrix for a data frame with three variables when missing values are present:

**#create data frame with some missing values
df <- data.frame(x=c(70, 78, 90, 87, 84, NA, 91, 74, 83, 85),
y=c(90, NA, 79, 86, 84, 83, 88, 92, 76, 75),
z=c(57, 57, 58, 59, 60, 78, 81, 83, NA, 90))
#attempt to create correlation matrix for variables in data frame
cor(df)
x y z
x 1 NA NA
y NA 1 NA
z NA NA 1
**

The **cor()** function returns **NA** in several locations since we didn’t specify how to handle missing values.

To avoid this issue, we can use the argument **use=’pairwise.complete.obs’** so that R knows to only use pairwise observations where both values are present:

**#create data frame with some missing values
df <- data.frame(x=c(70, 78, 90, 87, 84, NA, 91, 74, 83, 85),
y=c(90, NA, 79, 86, 84, 83, 88, 92, 76, 75),
z=c(57, 57, 58, 59, 60, 78, 81, 83, NA, 90))
#create correlation matrix for variables using only pairwise complete observations
cor(df, use='pairwise.complete.obs')
x y z
x 1.0000000 -0.4888749 0.1311651
y -0.4888749 1.0000000 -0.1562371
z 0.1311651 -0.1562371 1.0000000
**

The correlation coefficients for each pairwise combination of variables in the data frame are now shown.

