Matplotlib vs. ggplot2: Which Should You Use?


Two of the most popular data visualization libraries in all of data science are ggplot2 and Matplotlib.

The ggplot2 library is used in the R statistical programming language while Matplotlib is used in Python.

Although both libraries allow you to create highly customized data visualizations, ggplot2 generally allows you to do so in fewer lines of code compared to Matplotlib.

To illustrate this point, we’ll show how to create the same types of charts using both libraries.

Line Charts: ggplot2 vs. Matplotlib

The following code shows how to create a line chart using ggplot2:

library(ggplot2)

#create data frame
df <- data.frame(day=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
                 sales=c(2, 4, 5, 8, 6, 12, 15, 19, 15, 22))

#create line chart
ggplot(df, aes(x=day, y=sales)) +
  geom_line(size=1.2, col='purple') +
  ggtitle('Sales by Day') +
  xlab('Day') +
  ylab('Sales')

And the following code shows how to create the same line chart using Matplotlib:

import pandas as pd
import matplotlib.pyplot as plt 

#create DataFrame
df = pd.DataFrame({'day': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                   'sales': [2, 4, 5, 8, 6, 12, 15, 19, 15, 22]})

#create line chart
plt.plot(df.day, df.sales, color='purple')
plt.title('Sales by Day', loc='left')
plt.ylabel('Sales')
plt.xlabel('Day')

For this example, the number of lines of code needed to generate each plot is roughly the same between ggplot2 and Matplotlib.

Scatter Plots: ggplot2 vs. Matplotlib

The following code shows how to create a scatter plot in ggplot2 in which the points are colored by category:

library(ggplot2)

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 assists=c(1, 2, 2, 4, 5, 7, 8, 10),
                 points=c(4, 6, 10, 8, 12, 15, 22, 28))

#create scatter plot
ggplot(df, aes(x=assists, y=points)) +
  geom_point(aes(col=team), size=3)

And the following code shows how to create the same scatter plot using Matplotlib:

import pandas as pd
import matplotlib.pyplot as plt 

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'assists': [1, 2, 2, 4, 5, 7, 8, 10],
                   'points': [4, 6, 10, 8, 12, 15, 22, 28]})

#define colors to use
color_list = [] 
for x in df['team']: 
    if x == 'A': color_list.append('#F8766D') 
    else: color_list.append('#00BFC4') 

#create scatter plot
plt.scatter(df.assists, df.points, c=color_list)
plt.ylabel('points')
plt.xlabel('assists')

Notice that we had to use many more lines of code in Matplotlib to generate the same plot as ggplot2.

Histograms: ggplot2 vs. Matplotlib

The following code shows how to create a histogram in ggplot2:

library(ggplot2)

#create data frame
df <- data.frame(x=c(2, 2, 4, 4, 4, 5, 5, 6, 7, 7, 8, 8,
                     10, 11, 11, 11, 12, 13, 14, 14))

#create scatter plot
ggplot(df, aes(x=x)) +
  geom_histogram(bins=6, fill='red', color='black') +
  ggtitle('My Histogram')

And the following code shows how to create a similar histogram using Matplotlib:

import pandas as pd
import matplotlib.pyplot as plt 

#create DataFrame
df = pd.DataFrame({'x': [2, 2, 4, 4, 4, 5, 5, 6, 7, 7, 8, 8,
                         10, 11, 11, 11, 12, 13, 14, 14]})

#create histogram
plt.hist(df['x'], bins=6, color='red', ec='black')
plt.title('My Histogram', loc='left') 
plt.xlabel('x') 
plt.ylabel('Count')

Once again the Matplotlib version requires more lines of code than ggplot2.

Conclusion

Both ggplot2 and Matplotlib allow you to create highly customizable data visualizations, but ggplot2 tends to use less code.

Often the preference between ggplot2 and Matplotlib simply comes down to which programming language you use for data analysis.

People who use Python tend to use Matplotlib since they can perform their data analysis and create data visualizations using one programming language.

Conversely, people who use R tend to use ggplot2 because this allows them to perform all of their data analysis and visualizations in one programming language.

Leave a Reply

Your email address will not be published.