How to Count Distinct Values Using dplyr (With Examples)


You can use one of the following methods to count the number of distinct values in an R data frame using the n_distinct() function from dplyr:

Method 1: Count Distinct Values in One Column

n_distinct(df$column_name)

Method 2: Count Distinct Values in All Columns

sapply(df, function(x) n_distinct(x))

Method 3: Count Distinct Values by Group

df %>%
  group_by(grouping_column) %>%
  summarize(count_distinct = n_distinct(values_column))

The following examples show how to use each of these methods in practice with the following data frame:

library(dplyr)

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(6, 6, 8, 10, 9, 9, 12, 12),
                 assists=c(3, 6, 4, 2, 4, 5, 5, 9))

#view data frame
df

  team points assists
1    A      6       3
2    A      6       6
3    A      8       4
4    A     10       2
5    B      9       4
6    B      9       5
7    B     12       5
8    B     12       9

Method 1: Count Distinct Values in One Column

The following code shows how to use n_distinct() to count the number of distinct values in the ‘team’ column:

#count distinct values in 'team' column
n_distinct(df$team)

[1] 2

There are 2 distinct values in the ‘team’ column.

Method 2: Count Distinct Values in All Columns

The following code shows how to use the sapply() and n_distinct() functions to count the number of distinct values in each column of the data frame:

#count distinct values in every column
sapply(df, function(x) n_distinct(x))

   team  points assists 
      2       5       6

From the output we can see:

  • There are 2 distinct values in the ‘team’ column
  • There are 5 distinct values in the ‘points’ column
  • There are 6 distinct values in the ‘assists’ column

Method 3: Count Distinct Values by Group

The following code shows how to use the n_distinct() function to count the number of distinct values by group:

#count distinct 'points' values by 'team'
df %>%
  group_by(team) %>%
  summarize(distinct_points = n_distinct(points))

# A tibble: 2 x 2
  team  distinct_points 
1 A                   3
2 B                   2

From the output we can see:

  • There are 3 distinct points values for team A.
  • There are 2 distinct points values for team B.

Additional Resources

The following tutorials explain how to perform other common operations using dplyr:

How to Recode Values Using dplyr
How to Replace NA with Zero in dplyr
How to Rank Variables by Group Using dplyr
How to Select the First Row by Group Using dplyr

Leave a Reply

Your email address will not be published.