Search code examples
rdataframecbind

Sum count data in a dataframe based on the size of an associated numeric variable


I have a data frame with data as follows (although my data set is much bigger)

ID  Count  Size
1   1      35
1   2      42
1   2      56
2   3      25
2   5      52
2   2      62

etc....

I would like to extract the total count for each ID but split for when the size variable is either <50 or <=50

So far I have done this to get the cumulative count based on the unique id

cbind(aggregate(Count~ID, sum, data=df)

To produce this

ID Count
1  5
2  10

But I want to produce something like this instead

ID  <50  >=50
1   3    2
2   3    7

I've tried searching on how best to do this and am sure it is really simple but I'm struggling on how best to achieve this...any help would be great thanks!


Solution

  • We could use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'ID', we get the sum of 'Count' based on the logical indexes ('Size < 50,Size >=50`)

    library(data.table)
    setDT(df1)[,list(`<50` = sum(Count[Size <50]), 
                `>=50` = sum(Count[Size>=50])) , by = ID]
    #   ID <50 >=50
    #1:  1   3    2
    #2:  2   3    7
    

    A similar option with dplyr is

    library(dplyr)
    df1 %>%
        group_by(ID) %>% 
        summarise(`<50` = sum(Count[Size <50]),
                 `>=50` = sum(Count[Size>=50]))
    

    NOTE: It is better to name the columns as less50, greaterthanEq50 instead of the names suggested in the expected output.