I have a data frame with data as follows (although my data set is much bigger)
ID Count Size
1 1 35
1 2 42
1 2 56
2 3 25
2 5 52
2 2 62
etc....
I would like to extract the total count for each ID but split for when the size variable is either <50 or <=50
So far I have done this to get the cumulative count based on the unique id
cbind(aggregate(Count~ID, sum, data=df)
To produce this
ID Count
1 5
2 10
But I want to produce something like this instead
ID <50 >=50
1 3 2
2 3 7
I've tried searching on how best to do this and am sure it is really simple but I'm struggling on how best to achieve this...any help would be great thanks!
We could use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'ID', we get the sum
of 'Count' based on the logical indexes ('Size < 50,
Size >=50`)
library(data.table)
setDT(df1)[,list(`<50` = sum(Count[Size <50]),
`>=50` = sum(Count[Size>=50])) , by = ID]
# ID <50 >=50
#1: 1 3 2
#2: 2 3 7
A similar option with dplyr
is
library(dplyr)
df1 %>%
group_by(ID) %>%
summarise(`<50` = sum(Count[Size <50]),
`>=50` = sum(Count[Size>=50]))
NOTE: It is better to name the columns as less50
, greaterthanEq50
instead of the names suggested in the expected output.