Search code examples
rdataframesumifs

Aggregate data in one column while maintaining/return data in other columns


I have a data frame (called “dk”) like this:

Date        Country  ID      Description    Qty
21/05/2014  DK       17423   Frontline      240
26/05/2014  DK       17423   Frontline      360
21/05/2014  DK       73663   Frontline      77
21/05/2014  DK       73663   Frontline      120
...

I want to sum quantities per ID. The result should still show me the Country and Description (besides the ID and Qty). And if the dates are different for one ID, then mark with the earlier date, as follow:

Date        Country ID      Description      Qty
21/05/2014  DK      17423   Frontline        600
21/05/2014  DK      73663   Frontline        197

I’ve used aggregate, but it removed info from other columns.

data <- aggregate(dk$Qty ~ dk$ID, subset(dk, ID == 17423),sum)

ID      Qty
17423   600
73663   197

How could I get the result as described?

Thank you.


Solution

  • Could also do it with data.table package (I'll assume your dates are Date class)

    library(data.table)
    setDT(dk)[, list(Qty = sum(Qty), Date = min(Date)), by = c("ID", "Country", "Description")]
    

    If you want to aggregate only by ID, you could do the following (when I assume you want the first argument of rest of the columns)

    setDT(dk)[, lapply(.SD, function(x) ifelse(is.numeric(x), sum(x), head(as.character(x), 1))), by = ID]
    

    If Date isn't Date class you could do first

    dk <- data.table(dk, key = "ID") # Creates a data.table object and sorts it by "ID"
    dk[, Date:= as.Date(as.character(Date), "%d/%m/%Y")] #Transforms Date to "Date" class
    

    And then you can proceed as usual (just without setDT because it's already data.table class), for example:

    dk[, list(Qty = sum(Qty), Date = min(Date)), by = c("ID", "Country", "Description")]
    
    ##       ID Country Description Qty       Date
    ## 1: 17423      DK   Frontline 600 2014-05-21
    ## 2: 73663      DK   Frontline 197 2014-05-21