What I want to perform:
If hmonth=2
and hyear=2000
, subtract each observation of wageratio.female
from that of hmonth=1
and hyear=2000
.
If hmonth=2
and hyear=2001
, subtract each observation of wageratio.female
from that of hmonth=1
and hyear=2001
.
Repeat for all hmonth
and hyear
.
Create a variable called wageratio.lags
for the differences.
Below is a small section of my attempt at for
loop. Should I be using for
loop to achieve my desired output?
differences = list()
for i in range(len(hmonth)):
# Check if the current pair is (2, 2000) or (2, 2001)
if hmonth[i] == 2:
if hyear[i] == 2000:
# Subtract each observation of wageratio_female from that of hmonth=1 and hyear=2000
difference = wageratio_female[i] - wageratio_female[hmonth.index(1)]
differences.append(difference)
elif hyear[i] == 2001:
# Subtract each observation of wageratio_female from that of hmonth=1 and hyear=2001
difference = wageratio_female[i] - wageratio_female[hmonth.index(1)]
differences.append(difference)
Error: unexpected symbol in "for i"
Desired output:
hmonth | hyear | wageratio.female | wageratio.lags |
---|---|---|---|
1 | 2000 | -0.43 | -0.01 |
1 | 2001 | 0.18 | -0.62 |
2 | 2000 | -0.44 | 0.12 |
2 | 2001 | -0.44 | -0.47 |
3 | 2000 | -0.32 | -0.45 |
3 | 2001 | -0.91 | 0.70 |
4 | 2000 | -0.77 | 1.24 |
4 | 2001 | -0.21 | NA |
5 | 2000 | 0.47 | NA |
df <- data.frame(
wageratio_female = c(-0.43, 0.18, -0.44, -0.44, -0.32, -0.91, -0.77, -0.21, 0.47),
hmonth = c(1, 1, 2, 2, 3, 3, 4, 4, 5),
hyear = c(2000, 2001, 2000, 2001, 2000, 2001, 2000, 2001, 2000)
)
you can use the dplyr
lead/lag
functions to do this without a loop. For example
library(dplyr)
df %>%
group_by(hyear) %>%
arrange(hmonth) %>%
mutate(wageratio.lags = lead(wageratio_female) - wageratio_female) %>%
ungroup()
produces
wageratio_female hmonth hyear wageratio.lags
<dbl> <hvn_lbll> <hvn_lbll> <dbl>
1 -0.43 1 2000 -0.0100
2 0.18 1 2001 -0.62
3 -0.44 2 2000 0.12
4 -0.44 2 2001 -0.47
5 -0.32 3 2000 -0.45
6 -0.91 3 2001 0.7
7 -0.77 4 2000 1.24
8 -0.21 4 2001 NA
9 0.47 5 2000 NA