One error you may encounter when using R is:
Aggregation function missing: defaulting to length
This error occurs when you use the dcast function from the reshape2 package to convert a data frame from a long to wide format, but more than one value could be placed in the individual cells of the wide data frame.
The following example shows how to fix this error in practice.
How to Reproduce the Error
Suppose we have the following data frame in R that contains information about the sales of various products:
#create data frame
df <- data.frame(store=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
promotion=c('Y', 'Y', 'N', 'N', 'Y', 'Y', 'N', 'N'),
product=c(1, 2, 1, 2, 1, 2, 1, 2),
sales=c(12, 18, 29, 20, 30, 11, 15, 22))
#view data frame
df
store promotion product sales
1 A Y 1 12
2 A Y 2 18
3 A N 1 29
4 A N 2 20
5 B Y 1 30
6 B Y 2 11
7 B N 1 15
8 B N 2 22
Now suppose we attempt to use the dcast function to convert the data frame from a long to a wide format:
library(reshape2)
#convert data frame to wide format
df_wide <- dcast(df, store ~ product, value.var="sales")
#view result
df_wide
Aggregation function missing: defaulting to length
store 1 2
1 A 2 2
2 B 2 2
Notice that the dcast function works but we receive the warning message of Aggregation function missing.
How to Fix the Error
The reason we receive a warning message is because for each combination of store and product, there are two potential values we could use for sales.
For example, for store A and product 1, the sales value could be 12 or 29.
Thus, the dcast function defaults to using “length” as the aggregate function.
For example, the wide data frame tells us that for store A and product 1, there are a total of 2 sales values.
If you’d instead like to use a different aggregation function, you can use fun.aggregate.
For example, we can use the following syntax to calculate the sum of sales by store and product:
library(reshape2)
#convert data frame to wide format
df_wide <- dcast(df, store ~ product, value.var="sales", fun.aggregate=sum)
#view result
df_wide
store 1 2
1 A 41 38
2 B 45 33
Here’s how to interpret the values in the wide data frame:
- The sum of sales for store A and product 1 is 41.
- The sum of sales for store A and product 2 is 38.
- The sum of sales for store B and product 1 is 45.
- The sum of sales for store B and product 2 is 33.
Notice that we don’t receive any warning message this time because we used the fun.aggregate argument.
Additional Resources
The following tutorials explain how to fix other common errors in R:
How to Fix in R: Unexpected String Constant
How to Fix in R: invalid model formula in ExtractVars
How to Fix in R: argument is not numeric or logical: returning na