Logistic regression is a method we can use to fit a regression model when the response variable is binary.

Logistic regression uses a method known as *maximum likelihood estimation* to find an equation of the following form:

**log[p(X) / (1-p(X))] = β _{0} + β_{1}X_{1} + β_{2}X_{2} + … + β_{p}X_{p}**

where:

**X**: The j_{j}^{th}predictor variable**β**: The coefficient estimate for the j_{j}^{th}predictor variable

The formula on the right side of the equation predicts the **log odds** of the response variable taking on a value of 1.

The following step-by-step example shows how to fit a logistic regression model in SAS.

**Step 1: Create the Dataset**

First, we’ll create a dataset that contains information on the following three variables for 18 students:

- Acceptance into a certain college (1 = yes, 0 = no)
- GPA (scale of 1 to 4)
- ACT score (scale of 1 to 36)

/*create dataset*/ data my_data; input acceptance gpa act; datalines; 1 3 30 0 1 21 0 2 26 0 1 24 1 3 29 1 3 34 0 3 31 1 2 29 0 1 21 1 2 21 0 1 15 1 3 32 1 4 31 1 4 29 0 1 24 1 4 29 1 3 21 1 4 34 ; run; /*view dataset*/ proc print data=my_data;

**Step 2: Fit the Logistic Regression Model**

Next, we’ll use **proc logistic** to fit the logistic regression model, using “acceptance” as the response variable and “gpa” and “act” as the predictor variables.

**Note**: We must specify **descending** so SAS knows to predict the probability that the response variable will take on a value of 1. By default, SAS predicts the probability that the response variable will take on a value of 0.

**/*fit logistic regression model*/
proc logistic data=my_data descending;
model acceptance = gpa act;
run;**

The first table of interest is titled **Model Fit Statistics**.

From this table we can see the AIC value of the model, which turns out to be **16.595**. The lower the AIC value, the better a model is able to fit the data.

However, there is no threshold for what is considered a “good” AIC value. Rather, we use AIC to compare the fit of several models fit to the same dataset. The model with the lowest AIC value is generally considered the best.

The next table of interest is titled **Testing Global Null Hypothesis: BETA=0**.

From this table we can see the Likelihood Ratio Chi-square value of **13.4620** with a corresponding p-value of **0.0012**.

Since this p-value is less than .05, this tells us that the logistic regression model as a whole is statistically significant.

Next, we can analyze the coefficient estimates in the table titled Analysis of **Maximum Likelihood Estimates**.

From this table we can see the coefficients for gpa and act, which indicate the average change in log odds of getting accepted into the university for a one unit increase in each variable.

For example:

- A one-unit increase in GPA value is associated with an average increase of
**2.9665**in the log odds of getting accepted into the university. - A one-unit increase in ACT score is associated with an average
*decrease*of**0.1145**in the log odds of getting accepted into the university.

The corresponding p-values in the output also give us an idea of how effective each predictor variable is at predicting the probability of getting accepted:

- P-value of GPA:
**0.0679** - P-value of ACT:
**0.6289**

This tells us that GPA seems to be a statistically significant predictor of university acceptance while ACT score seems to not be statistically significant.

**Additional Resources**

The following tutorials explain how to fit other regression models in SAS:

How to Perform Simple Linear Regression in SAS

How to Perform Multiple Linear Regression in SAS