In statistics, **correlation **refers to the strength and direction of a relationship between two variables. The value of a correlation coefficient can range from -1 to 1, with -1 indicating a perfect negative relationship, 0 indicating no relationship, and 1 indicating a perfect positive relationship.

There are three common ways to measure correlation:

**Pearson Correlation: **Used to measure the correlation between two continuous variables. (e.g. height and weight)

**Spearman Correlation: **Used to measure the correlation between two ranked variables. (e.g. rank of a student’s math exam score vs. rank of their science exam score in a class)

**Kendall’s Correlation: **Used when you wish to use Spearman Correlation but the sample size is small and there are many tied ranks.

This tutorial explains how to find all three types of correlations in Stata.

**Loading the Data**

For each of the following examples we will use a dataset called *auto*. You can load this dataset by typing the following into the Command box:

use http://www.stata-press.com/data/r13/auto

We can get a quick look at the dataset by typing the following into the Command box:

summarize

We can see that there are 12 total variables in the dataset.

**How to Find Pearson Correlation in Stata**

We can find the Pearson Correlation Coefficient between the variables *weight *and *length* by using the **pwcorr **command:

pwcorr weight length

The Pearson Correlation coefficient between these two variables is **0.9460**. To determine if this correlation coefficient is significant, we can find the p-value by using the **sig **command:

pwcorr weight length, sig

The p-value is **0.000**. Since this is less than 0.05, the correlation between these two variables is statistically significant.

To find the Pearson Correlation Coefficient for multiple variables, simply type in a list of variables after the **pwcorr **command:

pwcorr weight length displacement, sig

Here is how to interpret the output:

- Pearson Correlation between weight and length = 0.9460 | p-value = 0.000
- Pearson Correlation between weight and displacement = 0.8949 | p-value = 0.000
- Pearson Correlation between displacement and length = 0.8351 | p-value = 0.000

**How to Find Spearman Correlation in Stata**

We can find the Spearman Correlation Coefficient between the variables *trunk *and *rep78 *by using the **spearman **command:

spearman trunk rep78

Here is how to interpret the output:

**Number of obs:**This is the number of pairwise observations used to calculate the Spearman Correlation Coefficient. Because there were some missing values for the variable*rep78*, Stata used only 69 (rather than the full 74) pairwise observations.**Spearman’s rho:**This is the Spearman correlation coefficient. In this case, it’s -0.2235, indicating there is a negative correlation between the two variables. As one increases, the other tends to decrease.**Prob > |t|:**This is the p-value associated with the hypothesis test. In this case, the p-value is 0.0649, which indicates there is not a statistically significant correlation between the two variables at α = 0.05.

We can find the Spearman Correlation Coefficient for multiple variables by simply typing more variables after the **spearman **command. We can find the correlation coefficient and the corresponding p-value for each pairwise correlation by using the **stats(rho p) **command:

spearman trunk rep78 gear_ratio, stats(rho p)

Here is how to interpret the output:

- Spearman Correlation between trunk and rep78 = -0.2235 | p-value = 0.0649
- Spearman Correlation between trunk and gear_ratio = -0.5187 | p-value = 0.0000
- Spearman Correlation between gear_ratio and rep78 = 0.4275 | p-value = 0.0002

**How to Find Kendall’s Correlation in Stata**

We can find Kendall’s Correlation Coefficient between the variables *trunk *and *rep78 *by using the **ktau **command:

ktau trunk rep78

Here is how to interpret the output:

**Number of obs:**This is the number of pairwise observations used to calculate Kendall’s Correlation Coefficient. Because there were some missing values for the variable*rep78*, Stata used only 69 (rather than the full 74) pairwise observations.**Kendall’s tau-b:**This is Kendall’s correlation coefficient between the two variables. We typically use this value instead of tau-a because tau-b makes adjustments for ties. In this case, tau-b = -0.1752, indicating a negative correlation between the two variables.**Prob > |z|:**This is the p-value associated with the hypothesis test. In this case, the p-value is 0.0662, which indicates there is not a statistically significant correlation between the two variables at α = 0.05.

We can find Kendall’s Correlation Coefficient for multiple variables by simply typing more variables after the **ktau **command. We can find the correlation coefficient and the corresponding p-value for each pairwise correlation by using the **stats(taub p) **command:

ktau trunk rep78 gear_ratio, stats(taub p)

- Kendall’s Correlation between trunk and rep78 = -0.1752 | p-value = 0.0662
- Kendall’s Correlation between trunk and gear_ratio = -0.3753 | p-value = 0.0000
- Kendall’s Correlation between gear_ratio and rep78 = 0.3206 | p-value = 0.0006

can we calculate p value of a hypothesized correlation coefficient other than default zero? e.g. considering a moderate to strong correlation (r=0.6) between the 2 measurements and its significance (p value?