How to Reshape Data Between Wide and Long Format in R

How to reshape data between wide and long format in R

This tutorial explains how to reshape data between wide and long format in R using the packages reshape2 and tidyr along with several examples.

Wide vs. Long 

Wide data has a column for each variable. By nature, this data looks wide. For example, the built-in R dataset iris is currently in a wide format:

#view first six lines of iris dataset
head(iris)

#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          5.1         3.5          1.4         0.2  setosa
#2          4.9         3.0          1.4         0.2  setosa
#3          4.7         3.2          1.3         0.2  setosa
#4          4.6         3.1          1.5         0.2  setosa
#5          5.0         3.6          1.4         0.2  setosa
#6          5.4         3.9          1.7         0.4  setosa

By contrast, long data has a column for possible variable types and a column for the values of those variables. By nature, this data looks longFor example, here is the iris dataset in a long format:

#  Species     variable value
#1  setosa Sepal.Length   5.1
#2  setosa Sepal.Length   4.9
#3  setosa Sepal.Length   4.7
#4  setosa Sepal.Length   4.6
#5  setosa Sepal.Length   5.0
#6  setosa Sepal.Length   5.4
...
#      Species    variable value
#595 virginica Petal.Width   2.5
#596 virginica Petal.Width   2.3
#597 virginica Petal.Width   1.9
#598 virginica Petal.Width   2.0
#599 virginica Petal.Width   2.3
#600 virginica Petal.Width   1.8

In most cases, datasets are recorded in a wide format simply because it’s more convenient and natural to do so. Thus, often when you’re importing a dataset into R it will be in a wide format.

However, long formats are often required for advanced statistical analysis techniques like regression. In addition, many graphing packages like ggplot2 require your data to be in a long format to generate most plots. 

Depending on what you’d like to do with your dataset, you may need it to be in a wide or long format. This tutorial explains how to convert data between wide and long formats using the tidyr package and the reshape2 package.

tidyr

The tidyr package has two functions for reshaping data:

gather()  – converts dataset from wide to long format. 

spread() – converts dataset from long to wide format.

gather()

The basic syntax for gather() is as follows:

gather(data, key = “key”, value = “value”, …)

  • data – the name of your dataset
  • key – name of the new key column
  • value – name of the new value column
  • – name of source columns that contain values

The following code illustrates how to convert the iris dataset from wide to long format using gather():

#install (if not already installed) and load tidyr package
if(!require(tidyr)){install.packages('tidyr')}

#convert iris dataset from wide to long format using gather()
long_iris <- gather(iris, key = "flower_part", value = "measurement",
                    Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)

#view first and last 6 rows of long iris dataset
head(long_iris); tail(long_iris)

#  Species  flower_part measurement
#1  setosa Sepal.Length         5.1
#2  setosa Sepal.Length         4.9
#3  setosa Sepal.Length         4.7
#4  setosa Sepal.Length         4.6
#5  setosa Sepal.Length         5.0
#6  setosa Sepal.Length         5.4

#      Species flower_part measurement
#595 virginica Petal.Width         2.5
#596 virginica Petal.Width         2.3
#597 virginica Petal.Width         1.9
#598 virginica Petal.Width         2.0
#599 virginica Petal.Width         2.3
#600 virginica Petal.Width         1.8

spread()

The basic syntax for spread() is as follows:

spread(data, key = “key”, value = “value”)

  • data – the name of your dataset
  • key – name of the column containing the new column names
  • value – name of the column containing values

The following code illustrates how to convert the built-in tidyr dataset table2 from long to wide format using spread():

#install (if not already installed) and load tidyr package
if(!require(tidyr)){install.packages('tidyr')}

#view first 6 rows of table2 dataset
head(table2)

#  country      year type           count                    
#1 Afghanistan  1999 cases            745
#2 Afghanistan  1999 population  19987071
#3 Afghanistan  2000 cases           2666
#4 Afghanistan  2000 population  20595360
#5 Brazil       1999 cases          37737
#6 Brazil       1999 population 172006362

#convert table2 from long to wide format using spread()
wide_table2 <- spread(table2, key = type, value = count)

#view first 6 rows of wide_table2 dataset
head(wide_table2);

#  country      year  cases population                 
#1 Afghanistan  1999    745   19987071
#2 Afghanistan  2000   2666   20595360
#3 Brazil       1999  37737  172006362
#4 Brazil       2000  80488  174504898
#5 China        1999 212258 1272915272
#6 China        2000 213766 1280428583

reshape2

The reshape2 package also has two functions for reshaping data:

melt()  – converts dataset from wide to long format. 

dcast() – converts dataset from long to wide format.

melt()

The basic syntax for melt() is as follows:

melt(data, id.vars = c(id_variable1, …), measure.vars = c(measure_var1, …))

  • data – the name of your dataset
  • id.vars – list of one or more columns to be used as IDs
  • measure.vars – list of one or more columns to be used as measurements

The following code illustrates how to convert the iris dataset from wide to long format using melt():

#install (if not already installed) and load reshape2 package
if(!require(reshape2)){install.packages('reshape2')}

#view first six lines of iris dataset
head(iris)

#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          5.1         3.5          1.4         0.2  setosa
#2          4.9         3.0          1.4         0.2  setosa
#3          4.7         3.2          1.3         0.2  setosa
#4          4.6         3.1          1.5         0.2  setosa
#5          5.0         3.6          1.4         0.2  setosa
#6          5.4         3.9          1.7         0.4  setosa

#convert iris dataset from wide to long format using melt()
long_iris <- melt(iris, id.vars = c('Species'), measure.vars = c('Sepal.Length',
                    'Sepal.Width', 'Petal.Length', 'Petal.Width'))

#view first six lines of long iris dataset 
head(long_iris)

#  Species     variable value
#1  setosa Sepal.Length   5.1
#2  setosa Sepal.Length   4.9
#3  setosa Sepal.Length   4.7
#4  setosa Sepal.Length   4.6
#5  setosa Sepal.Length   5.0
#6  setosa Sepal.Length   5.4

Note: If you leave out the measure.vars, melt will automatically use all the other variables as the id.vars. The opposite is true if you leave out id.vars.

dcast()

The basic syntax for dcast() is as follows:

dcast(data, formula, value.var = c(value_var1, …))

  • data – the name of your dataset
  • formula- a formula that specifies which columns we’d like to keep the same and which we’d like to use as new columns
  • value.var – Name of the column(s) whose values will be filled to cast. If none is provided, dcast() will attempt to guess this column(s) automatically.

The following code illustrates how to convert the built-in tidyr dataset table2 from long to wide format using dcast():

#load necessary packages
library(tidyr)
libary(reshape2)

#view first six lines of table2 
head(table2)

#  country      year type           count                    
#1 Afghanistan  1999 cases            745
#2 Afghanistan  1999 population  19987071
#3 Afghanistan  2000 cases           2666
#4 Afghanistan  2000 population  20595360
#5 Brazil       1999 cases          37737
#6 Brazil       1999 population 172006362

#convert table2 from long to wide format using dcast()
wide_table2 <- dcast(table2, country + year ~ type, value.var = 'count')

#view first six lines of wide_table2
head(wide_table2)

#      country year  cases population
#1 Afghanistan 1999    745   19987071
#2 Afghanistan 2000   2666   20595360
#3      Brazil 1999  37737  172006362
#4      Brazil 2000  80488  174504898
#5       China 1999 212258 1272915272
#6       China 2000 213766 1280428583

Leave a Reply

Your email address will not be published. Required fields are marked *