How to Create a Q-Q Plot in Python

Q-Q plot, short for “quantile-quantile” plot, is often used to assess whether or not a set of data potentially came from some theoretical distribution.

In most cases, this type of plot is used to determine whether or not a set of data follows a normal distribution.

This tutorial explains how to create a Q-Q plot for a set of data in Python.

Example: Q-Q Plot in Python

Suppose we have the following dataset of 100 values:

import numpy as np

#create dataset with 100 values that follow a normal distribution
data = np.random.normal(0,1, 1000)

#view first 10 values

array([ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ,  1.86755799,
       -0.97727788,  0.95008842, -0.15135721, -0.10321885,  0.4105985 ])

To create a Q-Q plot for this dataset, we can use the qqplot() function from the statsmodels library:

import statsmodels.api as sm
import matplotlib.pyplot as plt

#create Q-Q plot with 45-degree line added to plot
fig = sm.qqplot(data, line='45')

Q-Q plot in Python

In a Q-Q plot, the x-axis displays the theoretical quantiles. This means it doesn’t show your actual data, but instead it represents where your data would be if it were normally distributed.

The y-axis displays your actual data. This means that if the data values fall along a roughly straight line at a 45-degree angle, then the data is normally distributed.

We can see in our Q-Q plot above that the data values tend to closely follow the 45-degree, which means the data is likely normally distributed. This shouldn’t be surprising since we generated the 100 data values by using the numpy.random.normal() function.

Consider instead if we generated a dataset of 100 uniformally distributed values and created a Q-Q plot for that dataset:

#create dataset of 100 uniformally distributed values
data = np.random.uniform(0,1, 1000)

#generate Q-Q plot for the dataset
fig = sm.qqplot(data, line='45')

Q-Q plot with straight line in Python using matplotlib

The data values clearly do not follow the red 45-degree line, which is an indication that they do not follow a normal distribution.

Notes on Q-Q Plots

Keep in mind the following notes about Q-Q plots:

  • Although a Q-Q plot isn’t a formal statistical test, it offers an easy way to visually check whether or not a data set is normally distributed.
  • Be careful not to confuse Q-Q plots with P-P plots, which are less commonly used and not as useful for analyzing data values that fall on the extreme tails of the distribution.

You can find more Python tutorials here.

One Reply to “How to Create a Q-Q Plot in Python”

  1. zach, please correct this tutorial as it’s super misleading. If, for example you plot
    data = np.random.normal(0,5, 1000)
    and then add a 45 degree line, the points won’t be on the line as this tutorial indicates.
    For comparing data with a distribution it is better to use
    line = ‘r’ which will fit a regression line to fit the data. The analysis you use for your second example is simply wrong…

Leave a Reply

Your email address will not be published. Required fields are marked *