To **winsorize** data means to set extreme outliers equal to a specified percentile of the data.

For example, a 90% winsorization sets all observations greater than the 95th percentile equal to the value at the 95th percentile and all observations less than the 5th percentile equal to the value at the 5th percentile.

The easiest way to winsorize data in R is by using the **Winsorize()** function from the **DescTools** package, which is designed to perform this exact task.

This function uses the following basic syntax:

**Winsorize(x, minval = NULL, maxval = NULL, probs = c(0.05, 0.95), na.rm = FALSE, type = 7)**

where:

**x**: Name of vector to winsorize**minval**: All values lower than this value will be replaced by this value (default is 5%-quantile of x)**maxval**: All values larger than this value will be replaced by this value (default is 95%-quantile of x)**probs**: Numeric vector of probabilities as used in quantile**na.rm**: Whether to omit NA values when calculating quantiles**type**: an integer between 1 and 9 selecting one of the nine quantile algorithms detailed in ‘quantile’ function to be used

The following example shows how to use the **Winsorize()** function in practice.

**Note**: Before using the **Winsorize()** function, you need to first make sure that the **DescTools** package is installed.

You can use the following syntax to install this package:

**install.packages('DescTools')**

Once the **DescTools** package has successfully been installed, you can use the **Winsorize()** function without encountering any errors.

**Example: How to Winsorize Data in R**

Suppose that we create the following data frame that contains information about total sales made by various employees at some company:

#create data frame df <- data.frame(emp=LETTERS[1:18], sales=c(3, 14, 16, 16, 17, 29, 34, 36, 39, 47, 59, 64, 65, 66, 68, 79, 91, 98)) #view data frame df emp sales 1 A 3 2 B 14 3 C 16 4 D 16 5 E 17 6 F 29 7 G 34 8 H 36 9 I 39 10 J 47 11 K 59 12 L 64 13 M 65 14 N 66 15 O 68 16 P 79 17 Q 91 18 R 98

Suppose that we would like to winsorize the values in the **sales** column such that any sales value greater than the 95th percentile is set to the 95th percentile and any value less than the 5th percentile is set to the 5th percentile.

We can use the **Winsorize()** function to do so:

library(DescTools) #winsorize values in sales column of data frame df$sales <- Winsorize(df$sales) #view updated data frame df emp sales 1 A 12.35 2 B 14 3 C 16 4 D 16 5 E 17 6 F 29 7 G 34 8 H 36 9 I 39 10 J 47 11 K 59 12 L 64 13 M 65 14 N 66 15 O 68 16 P 79 17 Q 91 18 R 95.05

Notice that this returns all of the same values in the sales column except the first and last values in the column have been winsorized to be equal to the 5th and 95th percentile of values, respectively.

Specifically, we can see that the minimum value of **3** has been replaced with **12.35**, which represents the 5th percentile of values in the **sales** column.

We can also see that the maximum value of **98** has been replaced with **95.05**, which represents the 95th percentile of values in the **sales** column.

**Additional Resources**

The following tutorials explain how to perform other common tasks in R:

How to Scale Values Between 0 and 1 in R

How to Normalize Data in R

How to Standardize Data in R

How to Average Across Columns in R