Search code examples
rcountfrequency

Adding a column indicating current count of non-missing rows for the same ID


I have a quick question about counting non-missing entries of a column. Let's say I have the data that looks like:

data<-data.frame(id=c(1,1,1,1,2,2,2,3,3,3,3),var1=c(NA,2,5,3,NA,NA,6,4,4,NA,7))

How do I add a new column counting the current number non-missing var1 for each ID (as below)?

data<-data.frame(id=c(1,1,1,1,2,2,2,3,3,3,3),var1=c(NA,2,5,3,NA,NA,6,4,4,NA,7),count_nm=c(NA,1,2,3,NA,NA,1,1,2,NA,3))

The best I could do was to delete rows with var1==NA, and add the count for each ID. But I would like to know how to do it without deleting those rows. Thanks!


Solution

  • You can use cumsum on the complete.cases:

    library(dplyr)
    data |> 
      mutate(count_nm = cumsum(complete.cases(var1)), .by = id)
    

    I also like the convenient collapse::fcumsum function which has a na.rm argument.

    library(dplyr)
    data |> 
     mutate(count_nm = collapse::fcumsum(var1 > 0, na.rm = TRUE), .by = id)