How to Perform a Log Transformation in SAS

Many statistical tests make the assumption that the values for a particular variable are normally distributed.

However, often values are not normally distributed. One way to address this issue is to transform the variable by taking the log of each value.

By performing this transformation, a variable typically becomes closer to normally distributed.

The following example shows how to perform a log transformation on a variable in SAS.

Example: Log Transformation in SAS

Suppose we have the following dataset in SAS:

```/*create dataset*/
data my_data;
input x;
datalines;
1
1
1
2
2
2
2
2
2
3
3
3
6
7
8
;
run;

/*view dataset*/
proc print data=my_data;
```

We can use PROC UNIVARIATE to perform normality tests on the variable x to determine if it is normally distributed and also create a histogram to visualize the distribution of values:

```/*create histogram and perform normality tests*/
proc univariate data=my_data normal;
histogram x;
run;```

From the last table titled Tests for Normality we can see that the p-value for the Shapiro-Wilk test is less than .05, which provides strong evidence that the variable x is not normally distributed.

The histogram also shows that the distribution of values does not appear to be normally distributed:

We can attempt a log transformation on the original dataset to see if we can produce a dataset that is more normally distributed.

We can use the following code to create a new dataset in SAS in which we take the log of each of the original x values:

```/*use log transformation to create new dataset*/
data log_data;
set my_data;
x = log(x);
run;

/*view log transformed data*/
proc print data=log_data;```

We can then use PROC UNIVARIATE once again to perform normality tests on the transformed variable and produce a histogram as well:

```/*create histogram and perform normality tests*/
proc univariate data=log_data normal;
histogram x;
run;```

From the last table titled Tests for Normality we can see that the p-value for the Shapiro-Wilk test is now greater than .05.

The histogram also shows that the distribution of values is slightly more normally distributed than it was before the transformation:

Based on the results of the Shapiro-Wilk test and the histogram shown above, we would conclude that the log transformation created a variable that is much more normally distributed than the original variable.