# How to Remove Duplicate Rows in R

Often you may be interested in removing duplicated rows in a data frame in R. Fortunately this is easy to do using the distinct() function from the dplyr library.

`library(dplyr)`

This tutorial explains several examples of how to use this function in practice using the following data frame:

```#create data frame
df <- data.frame(x = c('a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'e'),
y = c(1, 2, 2, 4, 4, 5, 9, 17, 17, 25))

#view data frame
df

x  y
1  a  1
2  b  2
3  b  2
4  b  4
5  c  4
6  c  5
7  c  9
8  d 17
9  d 17
10 e 25
```

### Example 1: Remove Completely Duplicated Rows

The following code shows how to remove rows that are complete duplicates of other rows:

```#display only unique rows
distinct(df)

x  y
1 a  1
2 b  2
3 b  4
4 c  4
5 c  5
6 c  9
7 d 17
8 e 25

#find total number of rows in original data frame
nrow(df)

[1] 10

#find total number of unique rows
nrow(distinct(df))

[1] 8

#find total number of duplicate rows
nrow(df) - nrow(distinct(df))

[1] 2
```

We can see that 2 duplicate rows were removed from the data frame.

### Example 2: Remove Duplicates in One Column

The following code shows how to remove rows that have duplicates in one specific column of a data frame:

```#display only unique values in column x
distinct(df, x)

x
1 a
2 b
3 c
4 d
5 e

#display only unique values in column x
distinct(df, y)

y
1  1
2  2
3  4
4  5
5  9
6 17
7 25
```

You can also remove duplicate values in one column and still retain all other columns in the data frame:

```#display only unique values in column x and retain other columns
distinct(df, x, .keep_all = TRUE)

x  y
1 a  1
2 b  2
3 c  4
4 d 17
5 e 25

#display only unique values in column y and retain other columns
distinct(df, y, .keep_all = TRUE)

x  y
1 a  1
2 b  2
3 b  4
4 c  5
5 c  9
6 d 17
7 e 25```

You can find the complete documentation for the distinct() function here.