A dummy variable is a type of variable that we create in regression analysis so that we can represent a categorical variable as a numerical variable that takes on one of two values: zero or one.
For example, suppose we have the following dataset and we would like to use age and marital status to predict income:
To use marital status as a predictor variable in a regression model, we must convert it into a dummy variable.
Since it is currently a categorical variable that can take on three different values (“Single”, “Married”, or “Divorced”), we need to create k-1 = 3-1 = 2 dummy variables.
To create this dummy variable, we can let “Single” be our baseline value since it occurs most often. Thus, here’s how we would convert marital status into dummy variables:
The following example shows how to create dummy variables for this exact dataset in SAS.
Example: Creating Dummy Variables in SAS
First, let’s create the following dataset in SAS:
/*create dataset*/ data original_data; input income age status $; datalines; 45 23 single 48 25 single 54 24 single 57 29 single 65 38 married 69 36 single 78 40 married 83 59 divorced 98 56 divorced 104 64 married 107 53 married ; run; /*view dataset*/ proc print data=original_data;
Next, we can use two IF-THEN-ELSE statements to create dummy variables for the status variable:
/*create new dataset with dummy variables*/ data new_data; set original_data; if status = "married" then married = 1; else married = 0; if status = "divorced" then divorced = 1; else divorced = 0; run; /*view new dataset*/ proc print data=new_data;
Notice that the values for the two dummy variables (married and divorced) match the values we calculated in the introductory example.
We could then use these dummy variables in a regression model if we’d like since they’re both numeric.
The following tutorials explain how to perform other common tasks in SAS: