A **Q-Q plot,** short for “quantile-quantile” plot, is used to assess whether or not a set of data potentially came from some theoretical distribution.

In most cases, this type of plot is used to determine whether or not a set of data follows a normal distribution.

If the data is normally distributed, the points in a Q-Q plot will lie on a straight diagonal line.

Conversely, the more the points in the plot deviate significantly from a straight diagonal line, the less likely the set of data follows a normal distribution.

The easiest way to create a Q-Q plot in SAS is to use the **PROC UNIVARIATE** statement along with the **QQPLOT** statement:

proc univariate data=my_data noprint; qqplot my_variable; run;

The following examples show how to use this syntax in practice.

**Note**: We use the **NOPRINT** statement to suppress all other summary statistics and tables that are automatically generated by the **PROC UNIVARIATE** statement.

**Example 1: Create Q-Q Plot in SAS for Normal Data**

The following code shows how to create a Q-Q plot for a dataset that contains 1,000 observations generated from a normal distribution with a mean of 10 and standard deviation of 2:

**/*generate 1000 values that follow normal distribution with mean 10 and sd 2 */
data normal_data;
do i = 1 to 1000;
x = 10 + 2*rannor(1);
output;
end;
run;
/*create q-q plot*/
proc univariate data=normal_data noprint;
qqplot x;
run;**

We can see that the points lie mostly along a straight diagonal line with some minor deviations along each of the tails.

Based on this plot, we could safely assume that this set of data is normally distributed.

**Example 2: Q-Q Plot for Non-Normal Data**

The following code shows how to create a Q-Q plot for a dataset that contains 1,000 observations generated from an exponential distribution:

**/*generate 1000 values that follow an exponential distribution*/
data exp_data;
do i = 1 to 1000;
x = ranexp(1);
output;
end;
run;
/*create q-q plot*/
proc univariate data=exp_data noprint;
qqplot x;
run;**

Wan see that the points deviate significantly from a straight diagonal line. This is a clear indication that the set of data is not normally distributed.

This should make sense considering we specified that the data should follow an exponential distribution.

