I have a data frame (called “dk”) like this:
Date Country ID Description Qty
21/05/2014 DK 17423 Frontline 240
26/05/2014 DK 17423 Frontline 360
21/05/2014 DK 73663 Frontline 77
21/05/2014 DK 73663 Frontline 120
...
I want to sum quantities per ID. The result should still show me the Country and Description (besides the ID and Qty). And if the dates are different for one ID, then mark with the earlier date, as follow:
Date Country ID Description Qty
21/05/2014 DK 17423 Frontline 600
21/05/2014 DK 73663 Frontline 197
I’ve used aggregate
, but it removed info from other columns.
data <- aggregate(dk$Qty ~ dk$ID, subset(dk, ID == 17423),sum)
ID Qty
17423 600
73663 197
How could I get the result as described?
Thank you.
Could also do it with data.table
package (I'll assume your dates are Date
class)
library(data.table)
setDT(dk)[, list(Qty = sum(Qty), Date = min(Date)), by = c("ID", "Country", "Description")]
If you want to aggregate only by ID
, you could do the following (when I assume you want the first argument of rest of the columns)
setDT(dk)[, lapply(.SD, function(x) ifelse(is.numeric(x), sum(x), head(as.character(x), 1))), by = ID]
If Date
isn't Date
class you could do first
dk <- data.table(dk, key = "ID") # Creates a data.table object and sorts it by "ID"
dk[, Date:= as.Date(as.character(Date), "%d/%m/%Y")] #Transforms Date to "Date" class
And then you can proceed as usual (just without setDT
because it's already data.table
class), for example:
dk[, list(Qty = sum(Qty), Date = min(Date)), by = c("ID", "Country", "Description")]
## ID Country Description Qty Date
## 1: 17423 DK Frontline 600 2014-05-21
## 2: 73663 DK Frontline 197 2014-05-21