Search code examples
rdplyrhierarchical-datapanel-data

How to create a wave variable that takes into account respondents who drop and come back later to the panel?


I am trying to create a wave variable for panel data. I followed the suggestions in this thread: Create a sequential number (counter) for rows within each group of a dataframe. This results in the wave variable as you can see below in the output using this code: df = df %>% group_by(id) %>% mutate(wave = row_number()).

However the issue with this variable is that it doesn't capture when respondents drop from the panel for some time and then they come back again. For instance, respondent wit ID 1 drop from 2007 until 2009 from the panel and comes back in 2010, the wave variable using the above code generate 3 whereas in reality it is wave 6 as shown in the real_wave variable. Could someone please let me know whether there is a way to achieve the output in the real_wave variable using dplyr?

 id year      wave real_wave
    1   2005    1     1
    1   2006    2     2
    1   2010    3     6
    2   2008    1     1
    2   2009    2     2
    2   2012    3     5
    
    structure(list(id = structure(c(1, 1, 1, 2, 2, 2), format.stata = "%9.0g"), 
        year = structure(c(2005, 2006, 2010, 2008, 2009, 2012), format.stata = "%9.0g"), 
        wave = structure(c(1, 2, 3, 1, 2, 3), format.stata = "%9.0g"), 
        real_wave = structure(c(1, 2, 6, 1, 2, 5), format.stata = "%9.0g")), row.names = c(NA, 
    -6L), class = c("tbl_df", "tbl", "data.frame"))

Solution

  • Sounds like what you need isn't the row numbers, but the difference to the first item of each group (plus one because you start counting at 1). Therefore:

    df <- df %>% group_by(id) %>%
      mutate(wave = year - first(year) + 1)
    

    gives

    > df
    # A tibble: 6 x 4
    # Groups:   id [2]
         id  year  wave real_wave
      <dbl> <dbl> <dbl>     <dbl>
    1     1  2005     1         1
    2     1  2006     2         2
    3     1  2010     6         6
    4     2  2008     1         1
    5     2  2009     2         2
    6     2  2012     5         5