I am using an R package which extracts data from tables in a database based on the flag for each table. If the flag is 1
, extract data from that table. If the flag is 0
, don't extract data.
The extracted data is stored in the person_list_df
list as shown below
list(structure(list(person_id = 21:25, count = 2:6), class = "data.frame", row.names = c(NA,
-5L)), structure(list(person_id = 24:28, count = 3:7), class = "data.frame", row.names = c(NA,
-5L)))
Code is below
person_list_df = list()
casesANDcontrols_df = list()
list1 <- data.frame("person_id" = 21:25, "count" = 2:6)
list2 <- data.frame("person_id" = 24:28, "count" = 3:7)
person_list_df <- list(list1, list2)
What I would like to do is
a) Combine/Merge all this and get the sum of counts (for each person)
b) Filter and store only persons who have count > 4
I tried the below but it doesn't work
casesANDcontrols_df[[1]] <- do.call(rbind, persons_list_df) # how to sum and apply filter here?
I would like to combine/merge all of them and sum their counts and finally select persons who have count > 4
I expect my output to be like as shown below
casesANDcontrols_df[[1]]
person_id
1 24 #COUNT IS 8
2 25 #COUNT IS 10
3 26 #COUNT IS 5
4 27 #COUNT IS 6
5 28 #COUNT IS 7
The structure of caseANDcontrols_df
should be as shown below
In base R, you can combine the list into one dataframe with do.call
+rbind
, take sum
for each person_id
and keep rows where count
is greater than 4.
subset(aggregate(count ~ person_id, do.call(rbind, person_list_df), sum), count > 4)