I have dataset like the following:
id <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2)
v1 <- c("A","A","A","A","B","B","B","B","B","A","A","A","A","A","A","A","A","B","B","B","B","B","A","A","A","A")
v2 <- c(1,1,1,2,2,2,2,2,2,2,3,3,3,1,1,1,2,2,2,2,2,2,2,3,3,3)
mydata<- data.frame(id,v1,v2)
v1 represents moving or not from a condition A= Moving, B= Not moving and v2 is a sequence from locations every id passess from, 1->2->3. I want to add a column that says whether: id moving from 1 to 2, id remains at 2, id moving from 2 to 3.
Specifically, I try to add a column to the data frame based on the following conditions: if v1=A (id moving) and v2=1 or (v2=2 if previously v2 had been 1) phase= "Going", if v1=B (not moving) phase "Staying" and if v1=A (moving) and v2=3 or (v2=2 following a "Staying" phase) phase = "Returning".
Specifically, my intended output is:
phase <- c("Going","Going","Going","Going","Staying","Staying","Staying","Staying","Staying","Returning","Returning","Returning","Returning","Going","Going","Going","Going","Staying","Staying","Staying","Staying","Staying","Returning","Returning","Returning","Returning")
mydata2 <- cbind(mydata, phase)
I tried the following:
mydata <-mydata %>% group_by(id)%>% mutate(phase= case_when(v1 == "A" & (v2 == 1 | v2==2) ~ "Going", phase == "A" & (v2 == 2 | v2==3) ~ "Returning", phase == "B" ~ "Staying"))
But (v2 == 1 | v2==2)
doesn't account for the order of the values. Any ideas? Apologies if it sounds complicated happy to clarify further.
Create a grouping on 'v1' based on whether the adjacent elements are same or not (rleid
) after grouping by 'id', then using case_when
, if the 'v1' values are "B", return 'Staying', if 'v1' is 'A' and 'grp' is 1, then 'Going' and if 'grp' is greater than 1, then 'Returning'
library(data.table)
library(dplyr)
mydata %>%
group_by(id) %>%
mutate(grp = rleid(v1), phase = case_when(v1 == 'B' ~ 'Staying',
v1 == 'A' & grp == 1 ~ 'Going', v1 == 'A' &
grp > 1 ~ 'Returning')) %>%
ungroup %>%
select(-grp)