# The Difference Between merge() vs. join() in R

The merge() function in base R and the various join() functions from the dplyr package can both be used to join two data frames together.

There are two main differences between these two functions:

1. The join() functions from dplyr tend to be much faster than merge() on extremely large data frames.

2. The join() functions from dplyr preserve the original order of rows in the data frames while the merge() function automatically sorts the rows alphabetically based on the column you used to perform the join.

The following example illustrates difference #2 in practice.

### Example: The Difference Between merge() and join()

Suppose we have the following two data frames in R:

```#define first data frame
df1 <- data.frame(team=c('Mavs', 'Hawks', 'Spurs', 'Nets'),
points=c(99, 93, 96, 104))

df1

team points
1  Mavs     99
2 Hawks     93
3 Spurs     96
4  Nets    104

#define second data frame
df2 <- data.frame(team=c('Mavs', 'Hawks', 'Spurs', 'Nets'),
assists=c(19, 18, 22, 25))

df2

team assists
1  Mavs      19
2 Hawks      18
3 Spurs      22
4  Nets      25```

Suppose we use the merge() function in base R to perform a left join, using the ‘team’ column as the column to join on:

```#perform left join using base R
merge(df1, df2, by='team', all.x=TRUE)

team points assists
1 Hawks     93      18
2  Mavs     99      19
3  Nets    104      25
4 Spurs     96      22
```

Notice that the rows are sorted in alphabetical order based on the values in the ‘team’ column.

Now suppose we use the left_join() function from dplyr to perform a left join, again using the ‘team’ column as the column to join on:

```library(dplyr)