Often you may want to find all rows in one data frame that do not occur in another data frame in R.

Fortunately this is easy to do by using the **setdiff()** function from the **dplyr** package in R, which is designed to perform this exact task.

The **setdiff****()** function uses the following basic syntax:

**setdiff(x, y)**

where:

**x**: The name of the first data frame**y**: The name of the second data frame

Note that this function returns a data frame as a result.

Also note that a similar function is the **union()** function, which uses the same syntax and will return all rows that occur in *either* data frame.

The following example shows how to use the **setdiff****()** function from the **dplyr** package in practice.

**Note**: Before using the **setdiff****()** function, you may need to first install the **dplyr** package by using the following syntax:

install.packages('dplyr')

Once the **dplyr** package is installed, you can use the **setdiff****()** function.

**Example: How to Use the setdiff() Function in dplyr**

Suppose we create the following two data frames named **df1** and **df2**:

#create first data frame df1 <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'), points=c(14, 14, 19, 25, 40, 34, 38, 17)) df1 team points 1 A 14 2 A 14 3 A 19 4 A 25 5 B 40 6 B 34 7 B 38 8 B 17 #create second data frame df2 <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'), points=c(14, 10, 11, 15, 10, 32, 38, 27)) df2 team points 1 A 14 2 A 10 3 A 11 4 A 15 5 B 10 6 B 32 7 B 38 8 B 27

Suppose that we would like to find all rows in **df1** that do not occur in **df2**.

We can use the **setdiff()** function from the **dplyr** package to do so:

library(dplyr) #find all rows in df1 that do not occur in df2 df_diff <- setdiff(df1, df2) #view resulting data frame df_diff team points 1 A 19 2 A 25 3 B 40 4 B 34 5 B 17

Notice that the new data frame named **df_diff** contains all rows that occur in **df1** but do not occur in **df2**.

From the output we can see that a total of five rows occur in **df1** that do not occur in **df2**.

If you would simply like to know the number of rows that occur in **df1** and not **df2**, then you can wrap the **nrow()** function around the **setdiff()** function to return the number of resulting rows.

Note that the **nrow()** function is used to return the number of rows in a given data frame.

We can use the following syntax to return the number of rows that occur in **df1** and not **df2**:

library(dplyr) #return number of rows that occur in df1 and not df2 df_diff_num <- nrow(setdiff(df1, df2)) #view results df_diff_num [1] 5

This returns a value of **5**, which tells us that there are five rows that occur in **df1** that do not occur in **df2**. This matches the result from the previous example.

**Note**: You can find the complete documentation for the **set_diff()** function from the **dplyr** package here.

**Additional Resources**

The following tutorials explain how to perform other common tasks in R:

How to Use slice_min() in dplyr

How to Use the pull() Function in dplyr

How to Use top_n() in dplyr

How to Rename Columns Using dplyr