I have the dataframe below, showing sampling events and when each individual was observed, I want a new column showing the number of sampling events between the current observation and the previous most recent observation of an individual.
so what I have is this:
tibble(ind=c(rep("A",10),rep("B",10)),
event=c(c(1:10),c(1:10)),
obs=c(NA,NA,1,2,NA,NA,NA,4,NA,5,1,NA,NA,NA,2,NA,3,4,NA,5))
# A tibble: 20 × 3
ind event obs
<chr> <int> <dbl>
1 A 1 NA
2 A 2 NA
3 A 3 1
4 A 4 2
5 A 5 NA
6 A 6 NA
7 A 7 NA
8 A 8 4
9 A 9 NA
10 A 10 5
11 B 1 1
12 B 2 NA
13 B 3 NA
14 B 4 NA
15 B 5 2
16 B 6 NA
17 B 7 3
18 B 8 4
19 B 9 NA
20 B 10 5
And what I want is this, where missed.events = (event at obs[i])-(event at obs[i-1])-1:
# A tibble: 20 × 4
ind event obs missed.events
<chr> <int> <dbl> <dbl>
1 A 1 NA NA
2 A 2 NA NA
3 A 3 1 NA
4 A 4 2 0
5 A 5 NA NA
6 A 6 NA NA
7 A 7 NA NA
8 A 8 4 3
9 A 9 NA NA
10 A 10 5 1
11 B 1 1 NA
12 B 2 NA NA
13 B 3 NA NA
14 B 4 NA NA
15 B 5 2 3
16 B 6 NA NA
17 B 7 3 1
18 B 8 4 0
19 B 9 NA NA
20 B 10 5 1
I'm sure I am missing something obvious here, and I've tried various sorts of case_when() and if_else() conditions within a group_by(ind) %>% mutate()
argument, but can't quite figure this out.
thanks in advance
Obtain the subset with missed events and join back to the original dataframe
df %>%
filter(!is.na(obs)) %>%
mutate(missed.events=event-lag(event)-1, .by=ind) %>%
right_join(df)
Joining, by = c("ind", "event", "obs")
# A tibble: 20 x 4
ind event obs missed.events
<chr> <int> <dbl> <dbl>
1 A 1 NA NA
2 A 2 NA NA
3 A 3 1 NA
4 A 4 2 0
5 A 5 NA NA
6 A 6 NA NA
7 A 7 NA NA
8 A 8 4 3
9 A 9 NA NA
10 A 10 5 1
11 B 1 1 NA
12 B 2 NA NA
13 B 3 NA NA
14 B 4 NA NA
15 B 5 2 3
16 B 6 NA NA
17 B 7 3 1
18 B 8 4 0
19 B 9 NA NA
20 B 10 5 1