Search code examples
rdplyrcase-when

R: add new variable based on value change using dplyr


I have dataset like the following:

id <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2)
v1 <- c("A","A","A","A","B","B","B","B","B","A","A","A","A","A","A","A","A","B","B","B","B","B","A","A","A","A")  
v2 <- c(1,1,1,2,2,2,2,2,2,2,3,3,3,1,1,1,2,2,2,2,2,2,2,3,3,3)
mydata<- data.frame(id,v1,v2)

v1 represents moving or not from a condition A= Moving, B= Not moving and v2 is a sequence from locations every id passess from, 1->2->3. I want to add a column that says whether: id moving from 1 to 2, id remains at 2, id moving from 2 to 3.

Specifically, I try to add a column to the data frame based on the following conditions: if v1=A (id moving) and v2=1 or (v2=2 if previously v2 had been 1) phase= "Going", if v1=B (not moving) phase "Staying" and if v1=A (moving) and v2=3 or (v2=2 following a "Staying" phase) phase = "Returning".

Specifically, my intended output is:

phase <- c("Going","Going","Going","Going","Staying","Staying","Staying","Staying","Staying","Returning","Returning","Returning","Returning","Going","Going","Going","Going","Staying","Staying","Staying","Staying","Staying","Returning","Returning","Returning","Returning")

mydata2 <- cbind(mydata, phase)

I tried the following:

mydata <-mydata %>% group_by(id)%>% mutate(phase= case_when(v1 == "A" & (v2 == 1 | v2==2) ~ "Going", phase == "A" & (v2 == 2 | v2==3) ~ "Returning", phase == "B" ~ "Staying"))

But (v2 == 1 | v2==2) doesn't account for the order of the values. Any ideas? Apologies if it sounds complicated happy to clarify further.


Solution

  • Create a grouping on 'v1' based on whether the adjacent elements are same or not (rleid) after grouping by 'id', then using case_when, if the 'v1' values are "B", return 'Staying', if 'v1' is 'A' and 'grp' is 1, then 'Going' and if 'grp' is greater than 1, then 'Returning'

    library(data.table)
    library(dplyr)
    mydata %>% 
       group_by(id) %>%
       mutate(grp = rleid(v1), phase = case_when(v1 == 'B' ~ 'Staying',
            v1 == 'A' & grp == 1 ~ 'Going', v1 == 'A' &
              grp > 1 ~ 'Returning')) %>%
       ungroup %>%
       select(-grp)