Search code examples
rdplyrcountinglongitudinal

counting occurrence between observations


I'm challenged with this problem. I have these types of data:

df <- data.frame(
  ID = c(1,1,1,1,1,1,2,2,2,2,2,3,3,3,3),
  Pr = c(0, 1, 0, 999, -1, 1, 999, 1, 0, 0, 1, 0, 1, 0, 0),
  Yrs = c(2010,2011,2012,2013,2014,2015, 2010, 2011, 2012, 2013, 2014, 2012, 2013, 2014, 2015)
)


ID  Pr  Yrs
  1   0 2010
  1   1 2011
  1   0 2012
  1 999 2013
  1  -1 2014
  1   1 2015
  2 999 2010
  2   1 2011
  2   0 2012
  2   0 2013
  2   1 2014
  3   0 2012
  3   1 2013
  3   0 2014
  3   0 2015

I would like to get:

a)the number of (unique)IDs having "1" just once;

b)The distance (years) between the first occurrence of "1" and the following occurrence of "1", per group(ID).

Thank you for your help.


Solution

  • With a summary data frame as

    library(data.table)
    setDT(df)
    
    df_summ <- 
      df[, {one <- which(Pr == 1); 
            .(num_ones = length(one), gap = diff(Yrs[one[1:2]]))}
         , by = ID]
    

    We can see

    a)the number of (unique)IDs having "1" just once;

    df_summ[, sum(num_ones == 1)]
    # [1] 1
    

    b)The distance (years) between the first occurrence of "1" and the following occurrence of "1", per group(ID)

    See gap column

    df_summ
    #    ID num_ones gap
    # 1:  1        2   4
    # 2:  2        2   3
    # 3:  3        1  NA