How to Calculate Point-Biserial Correlation in Python


Point-biserial correlation is used to measure the relationship between a binary variable, x, and a continuous variable, y.

Similar to the Pearson correlation coefficient, the point-biserial correlation coefficient takes on a value between -1 and 1 where:

  • -1 indicates a perfectly negative correlation between two variables
  • 0 indicates no correlation between two variables
  • 1 indicates a perfectly positive correlation between two variables

This tutorial explains how to calculate the point-biserial correlation between two variables in Python.

Example: Point-Biserial Correlation in Python

Suppose we have a binary variable, x, and a continuous variable, y:

x = [0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0]
y = [12, 14, 17, 17, 11, 22, 23, 11, 19, 8, 12]

We can use the pointbiserialr() function from the scipy.stats library to calculate the point-biserial correlation between the two variables.

Note that this function returns a correlation coefficient along with a corresponding p-value:

import scipy.stats as stats

#calculate point-biserial correlation
stats.pointbiserialr(x, y)

PointbiserialrResult(correlation=0.21816, pvalue=0.51928)

The point-biserial correlation coefficient is 0.21816 and the corresponding p-value is 0.51928.

Since the correlation coefficient is positive, this indicates that when the variable x takes on the value “1” that the variable y tends to take on higher values compared to when the variable x takes on the value “0.”

Since the p-value of this correlation is not less than .05, this correlation is not statistically significant. 

You can find the exact details of how this correlation is calculated in the scipy.stats documentation.

Leave a Reply

Your email address will not be published.