How to Use setequal() Function in dplyr


Often you may want to check if two data frames contain the same rows (regardless of order) in R.

Fortunately this is easy to do by using the setequal() function from the dplyr package in R, which is designed to perform this exact task.

The setequal() function uses the following basic syntax:

setequal(x, y)

where:

  • x: The name of the first data frame
  • y: The name of the second data frame

Note that this function returns TRUE if both data frames contain all of the same rows or FALSE if the two data frames do not contain all of the same rows.

The following example shows how to use the setequal() function from the dplyr package in practice.

Note: Before using the setequal() function, you may need to first install the dplyr package by using the following syntax:

install.packages('dplyr')

Once the dplyr package is installed, you can use the setequal() function.

Example: How to Use the setequal() Function in dplyr

Suppose we create the following two data frames named df1 and df2:

#create first data frame
df1 <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B'),
                  points=c(14, 14, 19, 25, 40, 34))

df1

  team points
1    A     14
2    A     14
3    A     19
4    A     25
5    B     40
6    B     34

#create second data frame
df2 <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B'),
                  points=c(14, 14, 25, 19, 40, 34))

df2

  team points
1    A     14
2    A     14
3    A     25
4    A     19
5    B     40
6    B     34

Suppose that we would like to check if the two data frames contain the same rows, regardless of whether or not the rows are in the same order.

We can use the setequal() function from the dplyr package to do so:

library(dplyr)

#check if both data frames contain the same rows
setequal(df1, df2)

[1] TRUE

This returns TRUE, which tells us that the two data frames contain the same rows.

Note that the order of row numbers 3 and 4 are switched between the two data frames, but since these rows contain the same values the setequal() function still returns TRUE.

Suppose instead that we changed the last value of the points column in the second data frame:

#create first data frame
df1 <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B'),
                  points=c(14, 14, 19, 25, 40, 34))

df1

  team points
1    A     14
2    A     14
3    A     19
4    A     25
5    B     40
6    B     34

#create second data frame
df2 <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B'),
                  points=c(14, 14, 25, 19, 40, 60))

df2

  team points
1    A     14
2    A     14
3    A     25
4    A     19
5    B     40
6    B     60

Now suppose that we would like to check if these two data frames contain the same rows.

We can use the setequal() function from the dplyr package once again to do so:

library(dplyr)

#check if both data frames contain the same rows
setequal(df1, df2)

[1] FALSE

This returns FALSE, which tells us that the two data frames do not contain the same rows.

This is the expected result since we intentionally changed the last row in the second data frame to be different.

Note: You can find the complete documentation for the setequal() function from the dplyr package here.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Use slice_min() in dplyr
How to Use the pull() Function in dplyr
How to Use top_n() in dplyr
How to Rename Columns Using dplyr

Leave a Reply

Your email address will not be published. Required fields are marked *