I have a problem solving the following question:
the values start and end are repeated in every other rows. I would like to mark them with a unique number, that increases every time a new start value is reached.
I also want rows where the obj column is empty and not inbetween a start and an end, to remain empty in the ID column.
Thanks a lot in advance!
There might be a more streamlined way, but you can use cumsum()
with some logic.
library(dplyr)
library(tidyr)
x <- c("start", "end", NA)
df <- tibble(obj = x[c(1, 2, 1, 3, 2, 3, 3, 1, 3, 2, 1, 3, 2, 3, 3, 1, 3, 2)])
df %>%
mutate(ID = cumsum(replace_na(obj == "start", 0)),
ID = if_else(ID == cumsum(replace_na(obj == "end", 0)) & is.na(obj), NA_integer_, ID))
# A tibble: 18 x 2
obj ID
<chr> <int>
1 start 1
2 end 1
3 start 2
4 NA 2
5 end 2
6 NA NA
7 NA NA
8 start 3
9 NA 3
10 end 3
11 start 4
12 NA 4
13 end 4
14 NA NA
15 NA NA
16 start 5
17 NA 5
18 end 5