I want to number certain combinations of row in a dataframe (which is ordered on ID and on Time)
tc <- textConnection('
id time end_yn number
abc 10 0 1
abc 11 0 2
abc 12 1 3
abc 13 0 1
def 10 0 1
def 15 1 2
def 16 0 1
def 17 0 2
def 18 1 3
')
test <- read.table(tc, header=TRUE)
The goal is to create a new column ("journey_nr
") that give a unique number to each row based on the journey it belongs to. Journeys are defined as a sequence of rows per id
up until to end_yn == 1
, also if end_yn
never becomes 1, the journey should also be numbered (see the expected outcome example). It is only possible to have end_yn == 0
journeys at the end of a collection of rows for an ID (as shown at row 4 for id 3). So either no end_yn == 1
has occured for that ID or that happened before the end_yn == 0
-journey (see id == abc
in the example).
I know how to number using the data.table
package, but I do not know which columns to combine in order to get the expected outcome. I've searched the data.table
-tag on SO, but could not find a similar problem.
Expected outcome:
id time end_yn number journey_nr
abc 10 0 1 1
abc 11 0 2 1
abc 12 1 3 1
abc 13 0 1 2
def 10 0 1 3
def 15 1 2 3
def 16 0 1 4
def 17 0 2 4
def 18 1 3 4
Another base R answer:
test$journey <- cumsum(c(1,head(test$number,-1)) >= test$number)
Result:
> test
id time end_yn number journey
1 abc 10 0 1 1
2 abc 11 0 2 1
3 abc 12 1 3 1
4 abc 13 0 1 2
5 def 10 0 1 3
6 def 15 1 2 3
7 def 16 0 1 4
8 def 17 0 2 4
9 def 18 1 3 4