I'm challenged with this problem. I have these types of data:
df <- data.frame(
ID = c(1,1,1,1,1,1,2,2,2,2,2,3,3,3,3),
Pr = c(0, 1, 0, 999, -1, 1, 999, 1, 0, 0, 1, 0, 1, 0, 0),
Yrs = c(2010,2011,2012,2013,2014,2015, 2010, 2011, 2012, 2013, 2014, 2012, 2013, 2014, 2015)
)
ID Pr Yrs
1 0 2010
1 1 2011
1 0 2012
1 999 2013
1 -1 2014
1 1 2015
2 999 2010
2 1 2011
2 0 2012
2 0 2013
2 1 2014
3 0 2012
3 1 2013
3 0 2014
3 0 2015
I would like to get:
a)the number of (unique)IDs having "1" just once;
b)The distance (years) between the first occurrence of "1" and the following occurrence of "1", per group(ID).
Thank you for your help.
With a summary data frame as
library(data.table)
setDT(df)
df_summ <-
df[, {one <- which(Pr == 1);
.(num_ones = length(one), gap = diff(Yrs[one[1:2]]))}
, by = ID]
We can see
a)the number of (unique)IDs having "1" just once;
df_summ[, sum(num_ones == 1)]
# [1] 1
b)The distance (years) between the first occurrence of "1" and the following occurrence of "1", per group(ID)
See gap
column
df_summ
# ID num_ones gap
# 1: 1 2 4
# 2: 2 2 3
# 3: 3 1 NA