Search code examples
rdplyrreshape2tidyr

Sequencing along two variables of interest in R


I am trying to create a sequence along two different parameters about how people have moved from one location to another. I have the following information

name<- c("John", "John", "John", "Sam","Sam", "Robert", "Robert","Robert")
location<- c("London", "London", "Newyork", "Houston", "Houston", "London", "Paris","Paris")
start_yr<- c(2012, 2012, 2014, 2014, 2014,2012,2013, 2013)
end_yr<- c(2013, 2013, 2015, 2015,  2015, 2013, 2015, 2015)

df<- data.frame(name,location,start_yr, end_yr)

I need to seq_along the name and location and create a transition variable of year to know if this person has moved in that year or not. I tried this but it didn't work very well. I was getting strange years meaning the name column sometimes doesn't start with 1. Any suggestions on how to approach this problem?

ave(df$name,df$location, FUN = seq_along)

I would like to have

   name location move year
   John London   1    2012
   John London   0    2013
   John Newyork  1    2014
   John Newyork  0    2015

Solution

  • If I understand correctly, you could complete your dataframe by expanding it, for each name & location combination from the minimum start_yr to the maximum end_yr, then group by name and order by start_yr to check if location changed using lag():

    library(dplyr)
    library(tidyr)
    
    df %>% 
      group_by(name, location) %>%
      complete(start_yr = full_seq(min(start_yr):max(end_yr), 1)) %>%
      group_by(name) %>%
      arrange(start_yr) %>%
      mutate(move = +(lag(location) != location))
    

    This would return NA if, for a given name, there are no previous location, 0 if the location is the same and 1 if it changed:

    #Source: local data frame [14 x 5]
    #Groups: name [3]
    #
    #     name location start_yr end_yr  move
    #   (fctr)   (fctr)    (dbl)  (dbl) (int)
    #1    John   London     2012   2013    NA
    #2    John   London     2012   2013     0
    #3    John   London     2013     NA     0
    #4    John  Newyork     2014   2015     1
    #5    John  Newyork     2015     NA     0
    #6  Robert   London     2012   2013    NA
    #7  Robert   London     2013     NA     0
    #8  Robert    Paris     2013   2015     1
    #9  Robert    Paris     2013   2015     0
    #10 Robert    Paris     2014     NA     0
    #11 Robert    Paris     2015     NA     0
    #12    Sam  Houston     2014   2015    NA
    #13    Sam  Houston     2014   2015     0
    #14    Sam  Houston     2015     NA     0