How to Use clean_names() Function in R


Often you may want to “clean” the column names of a data frame in R so that they all have a standard pattern or case.

One of the best ways to do so is by using the clean_names() function from the janitor package in R, which can be used to perform this exact task.

The clean_names() function uses the following syntax:

clean_names(dat, case)

where:

  • dat: Name of data frame
  • case: The desired target case

Note that the case argument can take on any of the following values:

  • “snake” produces snake_case
  • “lower_camel” produces lowerCamel”
  • “upper_camel” produces UpperCamel
  • “all_caps” produces ALL_CAPS
  • “lower_upper” produces lowerUPPER
  • “upper_lower” produces UPPERlower

Typically you will use this function when you’re in the data cleaning phase of a project and you would like all of the columns in a particular data frame to have a uniform pattern.

The following example shows how to use the clean_names() function in practice in R.

Note: Before using the clean_names() function, you may need to first install the janitor package. You can use the following syntax to do so:

install.packages('janitor')

Once the janitor package has been installed, you can proceed to use the clean_names() function.

Example: How to Use the clean_names() Function in R

Suppose that we create a data frame in R that contains information about various basketball players:

#create data frame
df <- data.frame(TEAM=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 pointsscored=c(99, 68, 86, 88, 95, 74, 78, 93),
                 assists=c(22, 28, 31, 35, 34, 45, 28, 31),
                 Totalrebounds=c(30, 28, 24, 24, 30, 36, 30, 29))


#view data frame
df

  TEAM pointsscored assists Totalrebounds
1    A           99      22            30
2    A           68      28            28
3    A           86      31            24
4    A           88      35            24
5    B           95      34            30
6    B           74      45            36
7    B           78      28            30
8    B           93      31            29

Notice that the column names seem to have no consistency in the pattern or style in which they are named.

To “clean” up the names and give them all a uniform style, we can use the clean_names() function from the janitor package.

We can use the following syntax to do so:

library(janitor)

#clean names of data frame
clean_names(df)

  team pointsscored assists totalrebounds
1    A           99      22            30
2    A           68      28            28
3    A           86      31            24
4    A           88      35            24
5    B           95      34            30
6    B           74      45            36
7    B           78      28            30
8    B           93      31            29

This returns the exact same data frame with the column names “cleaned” so that they all have the same case.

Note that we can use the case argument to specify a specific case to use if we’d like.

For example, we can use the following syntax to convert each column name to only uppercase characters:

library(janitor)

#clean names of data frame
clean_names(df, case='all_caps')

  TEAM POINTSSCORED ASSISTS TOTALREBOUNDS
1    A           99      22            30
2    A           68      28            28
3    A           86      31            24
4    A           88      35            24
5    B           95      34            30
6    B           74      45            36
7    B           78      28            30
8    B           93      31            29

This returns the same data frame with all column names converted to uppercase characters.

We could also choose a different case to use, such as lower_camel, which converts each column name to a lower camel case:

library(janitor)

#clean names of data frame
clean_names(df, case='lower_camel')

  team pointsscored assists totalrebounds
1    A           99      22            30
2    A           68      28            28
3    A           86      31            24
4    A           88      35            24
5    B           95      34            30
6    B           74      45            36
7    B           78      28            30
8    B           93      31            29

Each column name now has a lower camel case.

Feel free to use whichever case you would like by specifying it in the case argument of the clean_names() function.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Sort a Table in R
How to Plot a Table in R
How to Create a Three-Way Table in R
How to Create a Frequency Table by Group in R

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *