# How to Calculate a Five Number Summary in Pandas

five number summary is a way to summarize a dataset using the following five values:

• The minimum
• The first quartile
• The median
• The third quartile
• The maximum

The five number summary is useful because it provides a concise summary of the distribution of the data in the following ways:

• It tells us where the middle value is located, using the median.
• It tells us how spread out the data is, using the first and third quartiles.
• It tells us the range of the data, using the minimum and the maximum.

The easiest way to calculate a five number summary for variables in a pandas DataFrame is to use the describe() function as follows:

```df.describe().loc[['min', '25%', '50%', '75%', 'max']]
```

The following example shows how to use this syntax in practice.

## Example: Calculate Five Number Summary in Pandas DataFrame

Suppose we have the following pandas DataFrame that contains information about various basketball players:

```import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
'points': [18, 22, 19, 14, 14, 11, 20, 28],
'assists': [5, 7, 7, 9, 12, 9, 9, 4],
'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

#view DataFrame
print(df)

team  points  assists  rebounds
0    A      18        5        11
1    B      22        7         8
2    C      19        7        10
3    D      14        9         6
4    E      14       12         6
5    F      11        9         5
6    G      20        9         9
7    H      28        4        12```

We can use the following syntax to calculate the five number summary for each numeric variable in the DataFrame:

```#calculate five number summary for each numeric variable
df.describe().loc[['min', '25%', '50%', '75%', 'max']]

points assists rebounds
min	11.0	 4.0	 5.00
25%	14.0	 6.5	 6.00
50%	18.5	 8.0	 8.50
75%	20.5	 9.0	10.25
max	28.0	12.0	12.00
```

Here’s how to interpret the output for the points variable:

• The minimum value is 11.
• The value at the 25th percentile is 14.
• The value at the 50th percentile is 18.5.
• The value at the 75th percentile is 20.5.
• The maximum value is 28.

We can interpret the values for the assists and rebounds variables in a similar manner.

If you’d only like to calculate the five number summary for one specific variable in the DataFrame, you can use the following syntax:

```#calculate five number summary for the points variable
df['points'].describe().loc[['min', '25%', '50%', '75%', 'max']]

min    11.0
25%    14.0
50%    18.5
75%    20.5
max    28.0
Name: points, dtype: float64
```

The output now displays the five number summary only for the points variable.