# How to Calculate Point-Biserial Correlation in R

Point-biserial correlation is used to measure the relationship between a binary variable, x, and a continuous variable, y.

Similar to the Pearson correlation coefficient, the point-biserial correlation coefficient takes on a value between -1 and 1 where:

• -1 indicates a perfectly negative correlation between two variables
• 0 indicates no correlation between two variables
• 1 indicates a perfectly positive correlation between two variables

This tutorial explains how to calculate the point-biserial correlation between two variables in R.

## Example: Point-Biserial Correlation in R

Suppose we have a binary variable, x, and a continuous variable, y:

x <- c(0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0)

y <- c(12, 14, 17, 17, 11, 22, 23, 11, 19, 8, 12)


We can use the built-in R function cor.test() to calculate the point-biserial correlation between the two variables:

#calculate point-biserial correlation
cor.test(x, y)

Pearson's product-moment correlation

data:  x and y
t = 0.67064, df = 9, p-value = 0.5193

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:
-0.4391885  0.7233704

sample estimates:
cor
0.2181635


From the output we can observe the following:

• The point-biserial correlation coefficient is 0.218
• The corresponding p-value is 0.5193

Since the correlation coefficient is positive, this indicates that when the variable x takes on the value “1” that the variable y tends to take on higher values compared to when the variable x takes on the value “0.”

However, since the p-value of this correlation is not less than .05, this correlation is not statistically significant.

Note that the output also provides a 95% confidence interval for the true correlation coefficient, which turns out to be:

95% C.I. = (-0.439, 0.723)

Since this confidence interval contains zero, this is further evidence that the correlation coefficient is not statistically significant.

Note: You can find the complete documentation for the cor.test() function here.

The following tutorials explain how to calculate other correlation coefficients in R:

May 13, 2024
April 25, 2024
April 19, 2024
April 18, 2024
April 18, 2024

## 2 Replies to “How to Calculate Point-Biserial Correlation in R”

1. Armita Shahesmaeili says:

Hello.Thank you very much for the synthax.I supposedcor.test doeasnt work for point.biserial correlation.Is there a seperate command?

1. James Carmichael says:

Hi Armita…Yes, you’re correct. The cor.test() function in most statistical software packages typically calculates correlation tests for Pearson’s correlation coefficient (for continuous variables) or Spearman’s rank correlation coefficient (for ordinal variables). However, for point-biserial correlation, which measures the relationship between a continuous variable and a binary variable, you would typically use a different approach.

You can calculate the point-biserial correlation coefficient directly using the formula:

$r_{pb} = \frac{{M_1 – M_0}}{{\sqrt{{\frac{{N_1N_0}}{{N(N-1)}}}}}}$

Where:
– $$M_1$$ is the mean of the continuous variable for the group with a value of 1 (e.g., the mean of the continuous variable for the group with the characteristic you’re interested in),
– $$M_0$$ is the mean of the continuous variable for the group with a value of 0,
– $$N_1$$ is the number of observations in the group with a value of 1,
– $$N_0$$ is the number of observations in the group with a value of 0,
– $$N$$ is the total number of observations.

You can then perform hypothesis tests or calculate confidence intervals for the point-biserial correlation coefficient using standard formulas or statistical software that supports this analysis. Some statistical software packages may have specific functions or procedures for point-biserial correlation analysis. If you’re using R, for example, you might need to use a package like psych or ltm to calculate point-biserial correlation coefficients and conduct related tests.