Search code examples
rfunctionfor-loopsumsubset

Create function to count occurrences within groups in R


I have a dataset with a unique ID for groups of patients called match_no and i want to count how many patients got sick in two different years by running a loop function to count the occurrences in a large dataset

for (i in db$match_no){(with(db, sum(db$TBHist16 == 1 & db$match_no == i))}

This is my attempt. I need i to cycle through each of the match numbers and count how many TB occurrences there was.

Can anyone correct my formula please.

Example here

df1 <- data.frame(Match_no = c(1, 1,1,1,1,2,2,2,2,2, 3,3,3,3,3, 4,4,4,4,4, 5,5,5,5,5),
                  var1 = c(1,1,1,0,0,1,1,1,0,0,0,1,1,1,1,1,0,0,0,1,1,1,1,0,1))

I want to count how many 1 values there are in each match number.

Thank you


Solution

  • Some ideas:

    1. Simple summary of all Match_no values:

      xtabs(~var1 + Match_no, data = df1)
      #     Match_no
      # var1 1 2 3 4 5
      #    0 2 2 1 3 1
      #    1 3 3 4 2 4
      
    2. Same as 1, but with a subset:

      xtabs(~ Match_no, data = subset(df1, var1 == 1))
      # Match_no
      # 1 2 3 4 5 
      # 3 3 4 2 4 
      
    3. Results in a frame:

      aggregate(var1 ~ Match_no, data = subset(df1, var1 == 1), FUN = length)
      #   Match_no var1
      # 1        1    3
      # 2        2    3
      # 3        3    4
      # 4        4    2
      # 5        5    4