id = sample.int(50,1000, replace = TRUE)
years <- c("2010", "2011", "2013", "2014", "2015")
year<- sample(years, 1000, prob=c(0.2,0.2,0.2,0.2, 0.2), replace = TRUE)
my_data = data.frame(id, year)
my_data <- my_data[order(id),]
For example, patient # 1 visited the hospital twice in 2010, once in 2011, no times in 2012 ... patient # 2 visited the hospital no times in 2010, 5 times in 2011, 3 times in 2012, etc.
For this dataset, I want to find out the number of the number of times each combination of "years" appears (not including duplicates). For example - since there are 5 possible years, there will be 2^5 = 32 possible combinations :
# sample output of the final result
combinations frequency
2010 11
2011 9
2012 5
2013 1
2014 19
2015 11
2011,2012 9
2011, 2012, 2013 5
2013, 2015 1
2010,2011,2012,2013,2014,2015 19
This would be the equivalent of finding out:
I tried to find a way to do this different ways:
# Method 1: Did not work
library(data.table)
final = setDT(my_data)[,list(Count=.N) ,names(my_data)]
# Method 2: Did not work
final = = aggregate(.~year,rev(aggregate(.~id,my_data,paste0,collapse = ",")),length)
# Method 3: Not sure if this is correct? There are 50 rows (instead of 32) and all frequency counts are 1?
library(dplyr)
library(stringr)
df = my_data %>%
group_by(id) %>%
summarise(years = str_c(year, collapse=",")) %>%
count(years)
My Question: Can someone please show me how to fix this?
In base R:
table(tapply(my_data$year, my_data$id,
function(x) paste(sort(unique(x)), collapse=',')))