# How to Calculate Summary Statistics for a Pandas DataFrame

You can use the following methods to calculate summary statistics for variables in a pandas DataFrame:

Method 1: Calculate Summary Statistics for All Numeric Variables

```df.describe()
```

Method 2: Calculate Summary Statistics for All String Variables

`df.describe(include='object')`

Method 3: Calculate Summary Statistics Grouped by a Variable

```df.groupby('group_column').mean()

df.groupby('group_column').median()

df.groupby('group_column').max()

...```

The following examples show how to use each method in practice with the following pandas DataFrame:

```import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],
'points': [18, 22, 19, 14, 14, 11, 20, 28, 30],
'assists': [5, np.nan, 7, 9, 12, 9, 9, 4, 5],
'rebounds': [11, 8, 10, 6, 6, 5, 9, np.nan, 6]})

#view DataFrame
print(df)

team  points  assists  rebounds
0    A      18      5.0      11.0
1    A      22      NaN       8.0
2    A      19      7.0      10.0
3    A      14      9.0       6.0
4    B      14     12.0       6.0
5    B      11      9.0       5.0
6    B      20      9.0       9.0
7    B      28      4.0       NaN
8    B      30      5.0       6.0
```

### Example 1: Calculate Summary Statistics for All Numeric Variables

The following code shows how to calculate the summary statistics for each numeric variable in the DataFrame:

```df.describe()

points	 assists	rebounds
count	9.000000	8.000000	8.000000
mean	19.555556	7.500000	7.625000
std	6.366143	2.725541	2.199838
min	11.000000	4.000000	5.000000
25%	14.000000	5.000000	6.000000
50%	19.000000	8.000000	7.000000
75%	22.000000	9.000000	9.250000
max	30.000000	12.000000	11.000000```

We can see the following summary statistics for each of the three numeric variables:

• count: The count of non-null values
• mean: The mean value
• std: The standard deviation
• min: The minimum value
• 25%: The value at the 25th percentile
• 50%: The value at the 50th percentile (also the median)
• 75%: The value at the 75th percentile
• max: The maximum value

### Example 2: Calculate Summary Statistics for All String Variables

The following code shows how to calculate the summary statistics for each string variable in the DataFrame:

```df.describe(include='object')

team
count	   9
unique	   2
top	   B
freq	   5```

We can see the following summary statistics for the one string variable in our DataFrame:

• count: The count of non-null values
• unique: The number of unique values
• top: The most frequently occurring value
• freq: The count of the most frequently occurring value

### Example 3: Calculate Summary Statistics Grouped by a Variable

The following code shows how to calculate the mean value for all numeric variables, grouped by the team variable:

```df.groupby('team').mean()

points	assists	rebounds
team
A	18.25	7.0	8.75
B	20.60	7.8	6.50
```

The output displays the mean value for the points, assists, and rebounds variables, grouped by the team variable.

Note that we can use similar syntax to calculate a different summary statistic, such as the median:

```df.groupby('team').median()

points	assists	rebounds
team
A	18.5	7.0	9.0
B	20.0	9.0	6.0```

The output displays the median value for the points, assists, and rebounds variables, grouped by the team variable.

Note: You can find the complete documentation for the describe function in pandas here.