How to Remove Columns with Same Value in R


Often you may want to remove columns from a data frame in R that contain the same value in every single row.

Here are the most common ways to do so:

Method 1: Remove Columns with Same Value Using Base R

#remove any columns that contain all of the same value
Filter(function(x) length(unique(x))>1, df)

This particular method uses only functions from base R to remove any columns in the data frame that contain the same value in each row of the column.

This method works by passing a function to the Filter() function, which filters the columns to only display the ones that have more than one unique value in the column.

Method 2: Remove Columns with Same Value Using dplyr

library(dplyr)

#remove any columns that contain all of the same value
df %>% select(where(~n_distinct(.) > 1))

This particular method uses functions from the dplyr package in R to remove any columns in the data frame that contain the same value in each row of the column.

This method works by selecting only the columns in the data frame where the number of distinct values in the column is greater than 1.

Both of these methods will produce the same result.

Note that for extremely large data frames the dplyr approach is likely to be faster.

The following examples show how to use each of these methods in practice with the following data frame in R that contains information about various basketball players:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 position=c('G', 'G', 'F', 'F', 'G', 'G', 'F', 'F'),
                 points=c(22, 28, 31, 35, 34, 45, 28, 31),
                 steals=c(2, 2, 2, 2, 2, 2, 2, 2),
                 assists=c(8, 10, 12, 12, 8, 4, 3, 9))

#view data frame
df

  team position points steals assists
1    A        G     22      2       8
2    A        G     28      2      10
3    A        F     31      2      12
4    A        F     35      2      12
5    B        G     34      2       8
6    B        G     45      2       4
7    B        F     28      2       3
8    B        F     31      2       9

Let’s jump in!

Example 1: Remove Columns with Same Value Using Base R

One way to remove columns that contain the same value in every single row is to use the following syntax in base R:

#remove any columns that contain all of the same value
Filter(function(x) length(unique(x))>1, df)

  team position points assists
1    A        G     22       8
2    A        G     28      10
3    A        F     31      12
4    A        F     35      12
5    B        G     34       8
6    B        G     45       4
7    B        F     28       3
8    B        F     31       9

Notice that this returns all columns from the data frame except for the steals column, which contained the value 2 in every single row of the column.

Every other column in the data frame had at least two unique values in the column.

Example 2: Remove Columns with Same Value Using dplyr

Another way to remove columns that contain the same value in every single row is to use the following syntax from the dplyr package in R:

library(dplyr)

#remove any columns that contain all of the same value
df %>% select(where(~n_distinct(.) > 1))

  team position points assists
1    A        G     22       8
2    A        G     28      10
3    A        F     31      12
4    A        F     35      12
5    B        G     34       8
6    B        G     45       4
7    B        F     28       3
8    B        F     31       9

Notice that this also returns all columns from the data frame except for the steals column, which contained the same value in every single row of the column.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Add Row to Data Frame Using dplyr
How to Arrange Rows in Custom Order Using dplyr
How to Filter Based on Factor in dplyr
How to Use top_n() in dplyr

Leave a Reply

Your email address will not be published. Required fields are marked *