The Difference Between merge() vs. join() in R


The merge() function in base R and the various join() functions from the dplyr package can both be used to join two data frames together.

There are two main differences between these two functions:

1. The join() functions from dplyr tend to be much faster than merge() on extremely large data frames.

2. The join() functions from dplyr preserve the original order of rows in the data frames while the merge() function automatically sorts the rows alphabetically based on the column you used to perform the join.

The following example illustrates difference #2 in practice.

Example: The Difference Between merge() and join()

Suppose we have the following two data frames in R:

#define first data frame
df1 <- data.frame(team=c('Mavs', 'Hawks', 'Spurs', 'Nets'),
                  points=c(99, 93, 96, 104))

df1

   team points
1  Mavs     99
2 Hawks     93
3 Spurs     96
4  Nets    104

#define second data frame
df2 <- data.frame(team=c('Mavs', 'Hawks', 'Spurs', 'Nets'),
                  assists=c(19, 18, 22, 25))

df2

   team assists
1  Mavs      19
2 Hawks      18
3 Spurs      22
4  Nets      25

Suppose we use the merge() function in base R to perform a left join, using the ‘team’ column as the column to join on:

#perform left join using base R
merge(df1, df2, by='team', all.x=TRUE)

   team points assists
1 Hawks     93      18
2  Mavs     99      19
3  Nets    104      25
4 Spurs     96      22

Notice that the rows are sorted in alphabetical order based on the values in the ‘team’ column.

Now suppose we use the left_join() function from dplyr to perform a left join, again using the ‘team’ column as the column to join on:

library(dplyr)

#perform left join using dplyr 
left_join(df1, df2, by='team')

   team points assists
1  Mavs     99      19
2 Hawks     93      18
3 Spurs     96      22
4  Nets    104      25

Notice that the order of the rows match the original order of the rows in the data frame before performing the left join.

Additional Resources

The following tutorials explain how to perform other common operations in R:

How to Do an Inner Join in R
How to Perform Fuzzy Matching in R
How to Add a Column to Data Frame in R
How to Drop Columns from Data Frame in R

Leave a Reply

Your email address will not be published. Required fields are marked *