How to Plot Distribution of Column Values in Pandas


You can use the following methods to plot a distribution of column values in a pandas DataFrame:

Method 1: Plot Distribution of Values in One Column

df['my_column'].plot(kind='kde')

Method 2: Plot Distribution of Values in One Column, Grouped by Another Column

df.groupby('group_column')['values_column'].plot(kind='kde')

The following examples show how to use each method in practice with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
                            'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'],
                   'points': [3, 3, 4, 5, 4, 7, 7, 7, 10, 11, 
                              8, 7, 8, 9, 12, 12, 12, 14, 15, 17]})

#view DataFrame
print(df)

   team  points
0     A       3
1     A       3
2     A       4
3     A       5
4     A       4
5     A       7
6     A       7
7     A       7
8     A      10
9     A      11
10    B       8
11    B       7
12    B       8
13    B       9
14    B      12
15    B      12
16    B      12
17    B      14
18    B      15
19    B      17

Example 1: Plot Distribution of Values in One Column

The following code shows how to plot the distribution of values in the points column:

#plot distribution of values in points column
df['points'].plot(kind='kde')

Note that kind=’kde’ tells pandas to use kernel density estimation, which produces a smooth curve that summarizes the distribution of values for a variable.

If you’d like to create a histogram instead, you can specify kind=’hist’ as follows:

#plot distribution of values in points column using histogram
df['points'].plot(kind='hist', edgecolor='black')

This method uses bars to represent frequencies of values in the points column as opposed to a smooth line that summarizes the shape of the distribution.

Example 2: Plot Distribution of Values in One Column, Grouped by Another Column

The following code shows how to plot the distribution of values in the points column, grouped by the team column:

import matplotlib.pyplot as plt

#plot distribution of points by team 
df.groupby('team')['points'].plot(kind='kde')

#add legend
plt.legend(['A', 'B'], title='Team')

#add x-axis label
plt.xlabel('Points')

The blue line shows the distribution of points for players on team A while the orange line shows the distribution of points for players on team B.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Add Titles to Plots in Pandas
How to Adjust the Figure Size of a Pandas Plot
How to Plot Multiple Pandas DataFrames in Subplots
How to Create and Customize Plot Legends in Pandas

Leave a Reply

Your email address will not be published.