How to Use intersect() Function in dplyr


Often you may want to return all rows that occur in both of two data frames in R.

Fortunately this is easy to do by using the intersect() function from the dplyr package in R, which is designed to perform this exact task.

The intersect() function uses the following basic syntax:

intersect(x, y)

where:

  • x: The name of the first data frame
  • y: The name of the second data frame

Note that this function returns a data frame as a result.

Also note that the opposite of this function is the union() function, which uses the same syntax and will return all rows that occur in either data frame.

The following example shows how to use the intersect() function from the dplyr package in practice.

Note: Before using the intersect() function, you may need to first install the dplyr package by using the following syntax:

install.packages('dplyr')

Once the dplyr package is installed, you can use the intersect() function.

Example: How to Use the intersect() Function in dplyr

Suppose we create the following two data frames named df1 and df2:

#create first data frame
df1 <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                  points=c(14, 14, 19, 25, 40, 34, 38, 17))

df1

  team points
1    A     14
2    A     14
3    A     19
4    A     25
5    B     40
6    B     34
7    B     38
8    B     17

#create second data frame
df2 <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                  points=c(14, 10, 11, 15, 10, 32, 38, 27))

df2

  team points
1    A     14
2    A     10
3    A     11
4    A     15
5    B     10
6    B     32
7    B     38
8    B     27

Suppose that we would like to return a single data frame that contains all rows that occur in both data frames.

We can use the intersect() function from the dplyr package to do so:

library(dplyr)

#return all rows that occur in both data frames
df_all <- intersect(df1, df2)

#view resulting data frame
df_all

  team points
1    A     14
2    B     38

Notice that the new data frame named df_all contains all rows that occur in both data frames.

From the output we can see that only two rows occur in both data frames.

If you would simply like to know the number of rows that occur in both data frames, then you can wrap the nrow() function around the intersect() function to return the number of resulting rows.

Note that the nrow() function is used to return the number of rows in a given data frame.

We can use the following syntax to return the number of rows that occur in both data frames:

library(dplyr)

#return number of rows that occur in both data frames
df_all_num <- nrow(intersect(df1, df2))

#view results
df_all_num

[1] 2

This returns a value of 2, which tells us that there are two rows that occur in both df1 and df2. This matches the result from the previous example.

Note that if the nrow() function returned a value of 0 then it would tell us that the two data frames do not share any rows in common.

Note: You can find the complete documentation for the intersect() function from the dplyr package here.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Use slice_min() in dplyr
How to Use the pull() Function in dplyr
How to Use top_n() in dplyr
How to Rename Columns Using dplyr

Leave a Reply

Your email address will not be published. Required fields are marked *