Search code examples
rcountdata.tablesummarize

How to generate a var to capture count total number with if condition in r


I have a data set looks like this:

library(data.table)
dt <- data.table(id = c("A", "A", "A", "B", "B", "B", "C", "C", "C"), Complete = c("Yes","No","Yes","Yes","No","Yes","Yes","Yes","Yes"))

> dt
   id Complete
1:  A      Yes
2:  A       No
3:  A      Yes
4:  B      Yes
5:  B       No
6:  B      Yes
7:  C      Yes
8:  C      Yes
9:  C      Yes

I would like to build var N_complete to capture the total count for complete=="Yes" by ID, The final data should looks like following. What should I do in order to achieve such results?

I tried

dt$N_complete <- unlist(lapply(split(dt,dt$ID), function(x) rep(summarize(n(x)[x$Complete=="Yes"],na.rm=T),nrow(x))))

Sorry for the mess. I am a beginner and my codes error might looks very silly.

enter image description here


Solution

  • Since you are using data.table you can easily compute the complete cases (number of 'Yes' entries) by group using:

    dt[, N_complete := sum(Complete == "Yes", na.rm = TRUE), by = .(id)]