Pandas: How to Use describe() for Only Mean and Std


You can use the describe() function to generate descriptive statistics for variables in a pandas DataFrame.

By default, the describe() function calculates the following metrics for each numeric variable in a DataFrame:

  • count (number of values)
  • mean (mean value)
  • std (standard deviation)
  • min (minimum value)
  • 25% (25th percentile)
  • 50% (50th percentile)
  • 75% (75th percentile)
  • max (max value)

However you can use the following syntax to only calculate the mean and standard deviation for each numeric variable:

df.describe().loc[['mean', 'std']]

The following example shows how to use this syntax in practice.

Example: Use describe() in Pandas to Only Calculate Mean and Std

Suppose we have the following pandas DataFrame that contains information about various basketball players:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
                   'points': [18, 22, 19, 14, 14, 11, 20, 28],
                   'assists': [5, 7, 7, 9, 12, 9, 9, 4],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

#view DataFrame
print(df)

  team  points  assists  rebounds
0    A      18        5        11
1    B      22        7         8
2    C      19        7        10
3    D      14        9         6
4    E      14       12         6
5    F      11        9         5
6    G      20        9         9
7    H      28        4        12

If we use the describe() function, we can calculate descriptive statistics for each numeric variable in the DataFrame:

#calculate descriptive statistics for each numeric variable
df.describe()

	   points	assists	   rebounds
count	 8.000000	8.00000	   8.000000
mean	18.250000	7.75000	   8.375000
std	 5.365232	2.54951	   2.559994
min	11.000000	4.00000	   5.000000
25%	14.000000	6.50000	   6.000000
50%	18.500000	8.00000	   8.500000
75%	20.500000	9.00000	  10.250000
max	28.000000	12.00000  12.000000

However, we can use the following syntax to only calculate the mean and standard deviation for each numeric variable:

#only calculate mean and standard deviation of each numeric variable
df.describe().loc[['mean', 'std']]

           points  assists  rebounds
mean	18.250000  7.75000  8.375000
std	 5.365232  2.54951  2.559994

Notice that the output only includes the mean and standard deviation for each numeric variable.

Note that the describe() function still calculated each descriptive statistic as earlier but we used the loc function to select only the rows with the names mean and std in the output.

Related: Pandas loc vs. iloc: What’s the Difference?

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

Pandas: How to Use describe() by Group
Pandas: How to Use describe() with Specific Percentiles
Pandas: How to Use describe() and Suppress Scientific Notation

Leave a Reply

Your email address will not be published. Required fields are marked *