One way to quantify the relationship between two variables is to use the Pearson correlation coefficient, which is a measure of the linear association between two variables. It always takes on a value between -1 and 1 where:
- -1 indicates a perfectly negative linear correlation between two variables
- 0 indicates no linear correlation between two variables
- 1 indicates a perfectly positive linear correlation between two variables
To determine if a correlation coefficient is statistically significant, you can calculate the corresponding t-score and p-value.
The formula to calculate the t-score of a correlation coefficient (r) is:
t = r√(n-2) / √(1-r2)
The p-value is calculated as the corresponding two-sided p-value for the t-distribution with n-2 degrees of freedom.
Correlation Test in R
To determine if the correlation coefficient between two variables is statistically significant, you can perform a correlation test in R using the following syntax:
cor.test(x, y, method=c(“pearson”, “kendall”, “spearman”))
- x, y: Numeric vectors of data
- method: Method used to calculate correlation between two vectors
The following example shows how to use this function to perform a correlation test in R.
Example: Correlation Test in R
Suppose we have the following two vectors in R:
x <- c(2, 3, 3, 5, 6, 9, 14, 15, 19, 21, 22, 23) y <- c(23, 24, 24, 23, 17, 28, 38, 34, 35, 39, 41, 43)
Before we perform a correlation test between the two variables, we can create a quick scatterplot to view their relationship:
#create scatterplot plot(x, y, pch=16)
There appears to be a positive correlation between the two variables. That is, as one increases the other tends to increase as well.
To see if this correlation is statistically significant, we can perform a correlation test:
#perform correlation test between the two vectors cor.test(x, y) Pearson's product-moment correlation data: x and y t = 7.8756, df = 10, p-value = 1.35e-05 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.7575203 0.9799783 sample estimates: cor 0.9279869
The correlation coefficient between the two vectors turns out to be 0.9279869.
The test statistic turns out to be 7.8756 and the corresponding p-value is 1.35e-05. Since this value is less than .05, we have sufficient evidence to say that the correlation between the two variables is statistically significant.