How to Remove Rows Using dplyr (With Examples)


You can use the following basic syntax to remove rows from a data frame in R using dplyr:

1. Remove any row with NA’s

df %>%
  na.omit()

2. Remove any row with NA’s in specific column

df %>%
  filter(!is.na(column_name))

3. Remove duplicates

df %>%
  distinct()

4. Remove rows by index position

df %>%
  filter(!row_number() %in% c(1, 2, 4))

5. Remove rows based on condition

df %>%
  filter(column1=='A' | column2 > 8)

The following examples show how to use each of these methods in practice with the following data frame:

library(dplyr)

#create data frame
df <- data.frame(team=c('A', 'A', 'B', 'B', 'C', 'C'),
                 points=c(4, NA, 7, 5, 9, 9),
                 assists=c(1, 3, 5, NA, 2, 2))

#view data frame
df

  team points assists
1    A      4       1
2    A     NA       3
3    B      7       5
4    B      5      NA
5    C      9       2
6    C      9       2

Example 1: Remove Any Row with NA’s

The following code shows how to remove any row with NA values from the data frame:

#remove any row with NA
df %>%
  na.omit()

  team points assists
1    A      4       1
3    B      7       5
5    C      9       2
6    C      9       2

Example 2: Remove Any Row with NA’s in Specific Columns

The following code shows how to remove any row with NA values in a specific column:

#remove any row with NA in 'points' column:
df %>%
  filter(!is.na(points))

  team points assists
1    A      4       1
2    B      7       5
3    B      5      NA
4    C      9       2
5    C      9       2

Example 3: Remove Duplicate Rows

The following code shows how to remove duplicate rows:

#remove duplicate rows
df %>%
  distinct()

  team points assists
1    A      4       1
2    A     NA       3
3    B      7       5
4    B      5      NA
5    C      9       2

Example 4: Remove Rows by Index Position

The following code shows how to remove rows based on index position:

#remove rows 1, 2, and 4
df %>%
  filter(!row_number() %in% c(1, 2, 4))

  team points assists
1    B      7       5
2    C      9       2
3    C      9       2

Example 5: Remove Rows Based on Condition

The following code shows how to remove rows based on specific conditions:

#only keep rows where team is equal to 'A' or points is greater than 8
df %>%
  filter(column1=='A' | column2 > 8)

  team points assists
1    A      4       1
2    A     NA       3
3    C      9       2
4    C      9       2

Additional Resources

The following tutorials explain how to perform other common functions in dplyr:

How to Select Columns by Index Using dplyr
How to Rank Variables by Group Using dplyr
How to Replace NA with Zero in dplyr

Leave a Reply

Your email address will not be published.