For each year, I want to subtract the share of the current month from the share of the previous month within the same year, except for January where the share of January is subtracted from the share of December of the previous year.
This is what I attempted but instead of getting my desired output, I'm getting NAs:
data %>%
arrange(month) %>%
mutate(unique = ave(month, month, FUN = seq_along)) %>%
group_by(year) %>%
mutate(
lags = ifelse(
month == 1,
share - share[month == 12 & year == unique(year) - 1],
share - share[month == month - 1 & year == unique(year)]
)
)
However, it works fine when I run these codes
mutate(lag = ifelse(month == 1, share[month == 2] - share[month == 1],
ifelse(month == 2, share[month == 3] - share[month == 2],
ifelse(month == 3, share[month == 4] - share[month == 3],
ifelse(month == 4, share[month == 5] - share[month == 4],
ifelse(month == 5, share[month == 6] - share[month == 5],
ifelse(month == 6, share[month == 7] - share[month == 6],
ifelse(month == 7, share[month == 8] - share[month == 7],
ifelse(month == 8, share[month == 9] - share[month == 8],
ifelse(month == 9, share[month == 10] - share[month == 9],
ifelse(month == 10, share[month == 11] - share[month == 10],
ifelse(month == 11, share[month == 12] - share[month == 11], NA)
But the issue is how can I specify when month=12 and year=2000, subtract share from month=1 and year=2001. When month=12 and year=2014, subtract share from month=1 and year=2015.
data <- data.frame(
month = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 6, 6, 7, 7, 8, 9, 9, 10, 10, 11, 11, 11, 12, 12, 12),
year = c(2000, 2000, 2015, 2000, 2000, 2015, 2000, 2000, 2000, 2000, 2000, 2001, 2001, 2014, 2014, 2014, 2014, 2014, 2000, 2000, 2000, 2001, 2001, 2001, 2001, 2014),
share = c(0.2, 0.4, 0.1, 0.6, 0.7, 0.3, 0.2, 0.8, 0.5, 0.1, 0.2, 0.4, 0.1, 0.6, 0.7, 0.3, 0.2, 0.8, 0.5, 0.1, 0.2, 0.4, 0.1, 0.6, 0.7, 0.3)
)
Desired output:
# A tibble: 26 × 5
# Groups: year [4]
month year share unique lag1
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2000 0.2 1 0.4
2 1 2000 0.4 2 0.3
3 1 2015 0.1 3 0.2
4 2 2000 0.6 1 -0.4
5 2 2000 0.7 2 0.1
6 2 2015 0.3 3 NA
7 3 2000 0.2 1 0.3
8 3 2000 0.8 2 -0.7
9 4 2000 0.5 1 -0.3
10 4 2000 0.1 2 NA
11 5 2000 0.2 1 NA
12 6 2001 0.4 1 NA
13 6 2001 0.1 2 NA
14 7 2014 0.6 1 -0.3
15 7 2014 0.7 2 NA
16 8 2014 0.3 1 -0.1
17 9 2014 0.2 1 NA
18 9 2014 0.8 2 NA
19 10 2000 0.5 1 -0.3
20 10 2000 0.1 2 NA
21 11 2000 0.2 1 NA
22 11 2001 0.4 2 0.2
23 11 2001 0.1 3 0.6
24 12 2001 0.6 1 NA
25 12 2001 0.7 2 NA
26 12 2014 0.3 3 -0.2
library(dplyr)
data %>%
arrange(month) %>%
mutate(diff = share[pmatch(month + 1, month)] - share,
year2 = if_else(month == 1, year - 1, year), .by = year) %>%
mutate(diff = coalesce(diff, share[pmatch(month - 11, month)] - share),
.by = year2) %>%
select(-year2)
Here pmatch
is used instead of match
because values once matched in pmatch
are excluded from the search for subsequent matches. A simple example is that match(c(1,1), c(1,1))
returns 1, 1, whereas pmatch(c(1,1), c(1,1))
returns 1, 2. I use this feature to deal with the repeated measures in some pairs of year-month.
month year share diff
1 1 2000 0.2 0.4
2 1 2000 0.4 0.3
3 1 2015 0.1 0.2
4 2 2000 0.6 -0.4
5 2 2000 0.7 0.1
6 2 2015 0.3 NA
7 3 2000 0.2 0.3
8 3 2000 0.8 -0.7
9 4 2000 0.5 -0.3
10 4 2000 0.1 NA
11 5 2000 0.2 NA
12 6 2001 0.4 NA
13 6 2001 0.1 NA
14 7 2014 0.6 -0.3
15 7 2014 0.7 NA
16 8 2014 0.3 -0.1
17 9 2014 0.2 NA
18 9 2014 0.8 NA
19 10 2000 0.5 -0.3
20 10 2000 0.1 NA
21 11 2000 0.2 NA
22 11 2001 0.4 0.2
23 11 2001 0.1 0.6
24 12 2001 0.6 NA
25 12 2001 0.7 NA
26 12 2014 0.3 -0.2