How to Create a Q-Q Plot in Python

Q-Q plot, short for “quantile-quantile” plot, is often used to assess whether or not a set of data potentially came from some theoretical distribution.

In most cases, this type of plot is used to determine whether or not a set of data follows a normal distribution.

This tutorial explains how to create a Q-Q plot for a set of data in Python.

Example: Q-Q Plot in Python

Suppose we have the following dataset of 100 values:

import numpy as np

#create dataset with 100 values that follow a normal distribution
data = np.random.normal(0,1, 1000)

#view first 10 values

array([ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ,  1.86755799,
       -0.97727788,  0.95008842, -0.15135721, -0.10321885,  0.4105985 ])

To create a Q-Q plot for this dataset, we can use the qqplot() function from the statsmodels library:

import statsmodels.api as sm
import matplotlib.pyplot as plt

#create Q-Q plot with 45-degree line added to plot
fig = sm.qqplot(data, line='45')

Q-Q plot in Python

In a Q-Q plot, the x-axis displays the theoretical quantiles. This means it doesn’t show your actual data, but instead it represents where your data would be if it were normally distributed.

The y-axis displays your actual data. This means that if the data values fall along a roughly straight line at a 45-degree angle, then the data is normally distributed.

We can see in our Q-Q plot above that the data values tend to closely follow the 45-degree, which means the data is likely normally distributed. This shouldn’t be surprising since we generated the 100 data values by using the numpy.random.normal() function.

Consider instead if we generated a dataset of 100 uniformally distributed values and created a Q-Q plot for that dataset:

#create dataset of 100 uniformally distributed values
data = np.random.uniform(0,1, 1000)

#generate Q-Q plot for the dataset
fig = sm.qqplot(data, line='45')

Q-Q plot with straight line in Python using matplotlib

The data values clearly do not follow the red 45-degree line, which is an indication that they do not follow a normal distribution.

Notes on Q-Q Plots

Keep in mind the following notes about Q-Q plots:

  • Although a Q-Q plot isn’t a formal statistical test, it offers an easy way to visually check whether or not a data set is normally distributed.
  • Be careful not to confuse Q-Q plots with P-P plots, which are less commonly used and not as useful for analyzing data values that fall on the extreme tails of the distribution.

You can find more Python tutorials here.

Leave a Reply

Your email address will not be published. Required fields are marked *