Search code examples
rfor-loopdplyrtidyverse

Creating lag variable using for loop


What I want to perform: If hmonth=2 and hyear=2000, subtract each observation of wageratio.female from that of hmonth=1 and hyear=2000. If hmonth=2 and hyear=2001, subtract each observation of wageratio.female from that of hmonth=1 and hyear=2001. Repeat for all hmonth and hyear. Create a variable called wageratio.lags for the differences.

Below is a small section of my attempt at for loop. Should I be using for loop to achieve my desired output?

differences = list()

for i in range(len(hmonth)):
    # Check if the current pair is (2, 2000) or (2, 2001)
    if hmonth[i] == 2:
        if hyear[i] == 2000:
            # Subtract each observation of wageratio_female from that of hmonth=1 and hyear=2000
            difference = wageratio_female[i] - wageratio_female[hmonth.index(1)]
            differences.append(difference)
        elif hyear[i] == 2001:
            # Subtract each observation of wageratio_female from that of hmonth=1 and hyear=2001
            difference = wageratio_female[i] - wageratio_female[hmonth.index(1)]
            differences.append(difference)
Error: unexpected symbol in "for i"

Desired output:

hmonth hyear wageratio.female wageratio.lags
1 2000 -0.43 -0.01
1 2001 0.18 -0.62
2 2000 -0.44 0.12
2 2001 -0.44 -0.47
3 2000 -0.32 -0.45
3 2001 -0.91 0.70
4 2000 -0.77 1.24
4 2001 -0.21 NA
5 2000 0.47 NA
df <- data.frame(
  wageratio_female = c(-0.43, 0.18, -0.44, -0.44, -0.32, -0.91, -0.77, -0.21, 0.47),
  hmonth = c(1, 1, 2, 2, 3, 3, 4, 4, 5),
  hyear = c(2000, 2001, 2000, 2001, 2000, 2001, 2000, 2001, 2000)
 )

Solution

  • you can use the dplyr lead/lag functions to do this without a loop. For example

    library(dplyr)
    df %>% 
      group_by(hyear) %>% 
      arrange(hmonth) %>% 
      mutate(wageratio.lags = lead(wageratio_female) - wageratio_female) %>%
      ungroup()
    

    produces

      wageratio_female     hmonth      hyear    wageratio.lags
                 <dbl> <hvn_lbll> <hvn_lbll>   <dbl>
    1            -0.43          1       2000 -0.0100
    2             0.18          1       2001 -0.62  
    3            -0.44          2       2000  0.12  
    4            -0.44          2       2001 -0.47  
    5            -0.32          3       2000 -0.45  
    6            -0.91          3       2001  0.7   
    7            -0.77          4       2000  1.24  
    8            -0.21          4       2001 NA     
    9             0.47          5       2000 NA