Pandas: How to Use a mutate() Function Equivalent to R


In the R programming language, we can use the mutate() function from the dplyr package to quickly add new columns to a data frame that are calculated from existing columns.

For example, the following code shows how to calculate the mean value of a specific column in R and add that value as a new column in a data frame:

library(dplyr)

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(30, 22, 19, 14, 14, 11, 20, 28))

#add new column that shows mean points by team
df <- df %>%
      group_by(team) %>%
      mutate(mean_points = mean(points))

#view updated data frame
df

  team  points mean_points           
1 A         30        21.2
2 A         22        21.2
3 A         19        21.2
4 A         14        21.2
5 B         14        18.2
6 B         11        18.2
7 B         20        18.2
8 B         28        18.2

The equivalent of the mutate() function in pandas is the transform() function.

The following example shows how to use this function in practice.

Example: Using transform() in pandas to Replicate mutate() in R

Suppose we have the following pandas DataFrame that shows the points scored by basketball players on various teams:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'points': [30, 22, 19, 14, 14, 11, 20, 28]})

#view DataFrame
print(df)

  team  points
0    A      30
1    A      22
2    A      19
3    A      14
4    B      14
5    B      11
6    B      20
7    B      28

We can use the transform() function to add a new column called mean_points that shows the mean points scored by each team:

#add new column to DataFrame that shows mean points by team
df['mean_points'] = df.groupby('team')['points'].transform('mean')

#view updated DataFrame
print(df)

  team  points  mean_points
0    A      30        21.25
1    A      22        21.25
2    A      19        21.25
3    A      14        21.25
4    B      14        18.25
5    B      11        18.25
6    B      20        18.25
7    B      28        18.25

The mean points value for players on team A was 21.25 and the mean points value for players on team B was 18.25, so these values were assigned accordingly to each player in a new column.

Notice that this matches the results we got from using the mutate() function in the introductory example.

It’s worth noting that you can also use lambda to perform some custom calculation within the transform() function.

For example, the following code shows how to use lambda to calculate the percentage of total points scored by each player on their respective teams:

#create new column called percent_of_points
df['percent_of_points'] = df.groupby('team')['points'].transform(lambda x: x/x.sum())

#view updated DataFrame
print(df)

  team  points  percent_of_points
0    A      30           0.352941
1    A      22           0.258824
2    A      19           0.223529
3    A      14           0.164706
4    B      14           0.191781
5    B      11           0.150685
6    B      20           0.273973
7    B      28           0.383562

Here’s how to interpret the output:

  • The first player on team A scored 30 out of 85 total points among team A players. Thus, his percentage of total points scored was 30/85 = 0.352941.
  • The second player on team A scored 22 out of 85 total points among team A players. Thus, his percentage of total points scored was 22/85 = 0.258824.

And so on.

Note that we can use the lambda argument within the transform() function to perform any custom calculation that we’d like.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

How to Perform a GroupBy Sum in Pandas
How to Use Groupby and Plot in Pandas
How to Count Unique Values Using GroupBy in Pandas

Leave a Reply

Your email address will not be published.