Pandas: Use Groupby to Calculate Mean and Not Ignore NaNs

When using the pandas groupby() function to group by one column and calculate the mean value of another column, pandas will ignore NaN values by default.

If you would instead like to display NaN if there are NaN values present in a column, you can use the following basic syntax:

df.groupby('team').agg({'points': lambda x: x.mean(skipna=False)})

This particular example will group the rows of the DataFrame by the team column and then calculate the mean value of the points column without ignoring NaN values.

The following example shows how to use this syntax in practice.

Example: Use pandas groupby() and Don’t Ignore NaNs

Suppose we have the following pandas DataFrame that contains information about various basketball players:

import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],
                   'points': [15, np.nan, 24, 25, 20, 35, 34, 19, 14, 12]})

#view DataFrame

  team  points
0    A    15.0
1    A     NaN
2    A    24.0
3    A    25.0
4    A    20.0
5    B    35.0
6    B    34.0
7    B    19.0
8    B    14.0
9    B    12.0

Suppose we use the following syntax to calculate the mean value of points, grouped by team:

#calculate mean of points, grouped by team

A    21.0
B    22.8
Name: points, dtype: float64

Notice that the mean value of points for each team is returned, even though there is a NaN value for team A in the points column.

By default, pandas simply ignores the NaN value when calculating the mean.

If you would instead like to display NaN as the mean value if there are indeed NaNs present, you can use the following syntax:

#calculate mean points value grouped by team and don't ignore NaNs
df.groupby('team').agg({'points': lambda x: x.mean(skipna=False)})

A	 NaN
B	22.8

Notice that a NaN value is returned as the mean points value for team A this time.

By using the argument skipna=False, we told pandas not to ignore the NaN values when calculating the mean.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Count Unique Values Using Pandas GroupBy
How to Apply Function to Pandas Groupby
How to Create Bar Plot from Pandas GroupBy

Leave a Reply

Your email address will not be published. Required fields are marked *