How to Use corrwith() in Pandas (With Examples)


You can use the corrwith() function in pandas to calculate the pairwise correlation between numerical columns with the same name in two different pandas DataFrames.

This function uses the following basic syntax:

df1.corrwith(df2)

Note: This function is different than the corr() function, which is used to calculate the correlation between two numerical columns within the same DataFrame.

The following example shows how to use the corrwith() function in practice.

Example: How to Use corrwith() in Pandas

Suppose we have the following two pandas DataFrames:

import pandas as pd

#create first DataFrame
df1 = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F'],
                    'points': [18, 22, 29, 25, 14, 11],
                    'assists': [4, 5, 5, 4, 8, 12],
                    'rebounds': [10, 6, 4, 6, 3, 5]})

print(df1)

  team  points  assists  rebounds
0    A      18        4        10
1    B      22        5         6
2    C      29        5         4
3    D      25        4         6
4    E      14        8         3
5    F      11       12         5

#create second DataFrame 
df2 = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F'],
                    'points': [22, 25, 27, 35, 25, 20],
                    'assists': [15, 13, 8, 8, 5, 8],
                    'rebs': [4, 11, 12, 8, 7, 10]})

print(df2)

  team  points  assists  rebs
0    A      22       15     4
1    B      25       13    11
2    C      27        8    12
3    D      35        8     8
4    E      25        5     7
5    F      20        8    10

We can use the corrwith() function to calculate the correlation between the numeric columns with the same names in the two DataFrames:

#calculate correlation between numeric columns with same names in each DataFrame
df1.corrwith(df2)

points      0.677051
assists    -0.478184
rebounds         NaN
rebs             NaN
dtype: float64

From the output we can see:

  • The correlation between the values in the points columns in the two DataFrames is 0.677.
  • The correlation between the values in the assists columns in the two DataFrames is -0.478.

Since the column names rebounds and rebs didn’t exist in both DataFrames, a value of NaN is returned for each of these columns.

Note # 1: By default, the corrwith() function calculates the Pearson correlation coefficient between columns, but you can also specify method=’kendall’ or method=’spearman’ to instead calculate a different type of correlation coefficient.

Note #2: You can find the complete documentation for the corrwith() function here.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

How to Calculate Correlation By Group in Pandas
How to Calculate Rolling Correlation in Pandas
How to Calculate Correlation Between Two Columns in Pandas

Leave a Reply

Your email address will not be published.