One error you may encounter when using R is:

**Aggregation function missing: defaulting to length
**

This error occurs when you use the **dcast** function from the **reshape2** package to convert a data frame from a long to wide format, but more than one value could be placed in the individual cells of the wide data frame.

The following example shows how to fix this error in practice.

**How to Reproduce the Error**

Suppose we have the following data frame in R that contains information about the sales of various products:

#create data frame df <- data.frame(store=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'), promotion=c('Y', 'Y', 'N', 'N', 'Y', 'Y', 'N', 'N'), product=c(1, 2, 1, 2, 1, 2, 1, 2), sales=c(12, 18, 29, 20, 30, 11, 15, 22)) #view data frame df store promotion product sales 1 A Y 1 12 2 A Y 2 18 3 A N 1 29 4 A N 2 20 5 B Y 1 30 6 B Y 2 11 7 B N 1 15 8 B N 2 22

Now suppose we attempt to use the **dcast** function to convert the data frame from a long to a wide format:

library(reshape2) #convert data frame to wide format df_wide <- dcast(df, store ~ product, value.var="sales") #view result df_wide Aggregation function missing: defaulting to length store 1 2 1 A 2 2 2 B 2 2

Notice that the dcast function works but we receive the warning message of** Aggregation function missing**.

**How to Fix the Error**

The reason we receive a warning message is because for each combination of **store** and **product**, there are two potential values we could use for **sales**.

For example, for store A and product 1, the sales value could be 12 or 29.

Thus, the **dcast** function defaults to using “length” as the aggregate function.

For example, the wide data frame tells us that for store A and product 1, there are a total of **2** sales values.

If you’d instead like to use a different aggregation function, you can use **fun.aggregate**.

For example, we can use the following syntax to calculate the sum of sales by **store** and **product**:

library(reshape2) #convert data frame to wide format df_wide <- dcast(df, store ~ product, value.var="sales", fun.aggregate=sum) #view result df_wide store 1 2 1 A 41 38 2 B 45 33

Here’s how to interpret the values in the wide data frame:

- The sum of sales for store A and product 1 is
**41**. - The sum of sales for store A and product 2 is
**38**. - The sum of sales for store B and product 1 is
**45**. - The sum of sales for store B and product 2 is
**33**.

Notice that we don’t receive any warning message this time because we used the **fun.aggregate** argument.

