Search code examples
rdplyr

subtract the share of the current month from the share of the previous month within the same year


For each year, I want to subtract the share of the current month from the share of the previous month within the same year, except for January where the share of January is subtracted from the share of December of the previous year.

This is what I attempted but instead of getting my desired output, I'm getting NAs:

data %>%
  arrange(month) %>%
  mutate(unique = ave(month, month, FUN = seq_along)) %>%
  group_by(year) %>%
  mutate(
    lags = ifelse(
      month == 1, 
      share - share[month == 12 & year == unique(year) - 1], 
      share - share[month == month - 1 & year == unique(year)]
    )
  )

However, it works fine when I run these codes

mutate(lag = ifelse(month == 1, share[month == 2] - share[month == 1], 
                      ifelse(month == 2, share[month == 3] - share[month == 2],
                             ifelse(month == 3, share[month == 4] - share[month == 3],
                                    ifelse(month == 4, share[month == 5] - share[month == 4],
                                           ifelse(month == 5, share[month == 6] - share[month == 5],
                                                  ifelse(month == 6, share[month == 7] - share[month == 6],
                                                         ifelse(month == 7, share[month == 8] - share[month == 7],
                                                                ifelse(month == 8, share[month == 9] - share[month == 8],
                                                                       ifelse(month == 9, share[month == 10] - share[month == 9],
                                                                              ifelse(month == 10, share[month == 11] - share[month == 10],
                                                                                     ifelse(month == 11, share[month == 12] - share[month == 11], NA)

But the issue is how can I specify when month=12 and year=2000, subtract share from month=1 and year=2001. When month=12 and year=2014, subtract share from month=1 and year=2015.

data <- data.frame(
  month = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 6, 6, 7, 7, 8, 9, 9, 10, 10, 11, 11, 11, 12, 12, 12),
  year = c(2000, 2000, 2015, 2000, 2000, 2015, 2000, 2000, 2000, 2000, 2000, 2001, 2001, 2014, 2014, 2014, 2014, 2014, 2000, 2000, 2000, 2001, 2001, 2001, 2001, 2014),
  share = c(0.2, 0.4, 0.1, 0.6, 0.7, 0.3, 0.2, 0.8, 0.5, 0.1, 0.2, 0.4, 0.1, 0.6, 0.7, 0.3, 0.2, 0.8, 0.5, 0.1, 0.2, 0.4, 0.1, 0.6, 0.7, 0.3)
)

Desired output:

# A tibble: 26 × 5
# Groups:   year [4]
   month  year share unique  lag1
   <dbl> <dbl> <dbl>  <dbl> <dbl>
 1     1  2000   0.2      1    0.4
 2     1  2000   0.4      2    0.3
 3     1  2015   0.1      3    0.2
 4     2  2000   0.6      1    -0.4
 5     2  2000   0.7      2    0.1
 6     2  2015   0.3      3    NA
 7     3  2000   0.2      1    0.3
 8     3  2000   0.8      2    -0.7
 9     4  2000   0.5      1    -0.3
10     4  2000   0.1      2    NA
11     5  2000   0.2      1    NA
12     6  2001   0.4      1    NA
13     6  2001   0.1      2    NA
14     7  2014   0.6      1    -0.3
15     7  2014   0.7      2    NA
16     8  2014   0.3      1    -0.1
17     9  2014   0.2      1    NA
18     9  2014   0.8      2    NA
19    10  2000   0.5      1    -0.3
20    10  2000   0.1      2    NA
21    11  2000   0.2      1    NA
22    11  2001   0.4      2    0.2
23    11  2001   0.1      3    0.6
24    12  2001   0.6      1    NA
25    12  2001   0.7      2    NA
26    12  2014   0.3      3    -0.2

Solution

  • library(dplyr)
    
    data %>%
      arrange(month) %>% 
      mutate(diff = share[pmatch(month + 1, month)] - share,
             year2 = if_else(month == 1, year - 1, year), .by = year) %>%
      mutate(diff = coalesce(diff, share[pmatch(month - 11, month)] - share),
             .by = year2) %>%
      select(-year2)
    

    Here pmatch is used instead of match because values once matched in pmatch are excluded from the search for subsequent matches. A simple example is that match(c(1,1), c(1,1)) returns 1, 1, whereas pmatch(c(1,1), c(1,1)) returns 1, 2. I use this feature to deal with the repeated measures in some pairs of year-month.

    Output
       month year share diff
    1      1 2000   0.2  0.4
    2      1 2000   0.4  0.3
    3      1 2015   0.1  0.2
    4      2 2000   0.6 -0.4
    5      2 2000   0.7  0.1
    6      2 2015   0.3   NA
    7      3 2000   0.2  0.3
    8      3 2000   0.8 -0.7
    9      4 2000   0.5 -0.3
    10     4 2000   0.1   NA
    11     5 2000   0.2   NA
    12     6 2001   0.4   NA
    13     6 2001   0.1   NA
    14     7 2014   0.6 -0.3
    15     7 2014   0.7   NA
    16     8 2014   0.3 -0.1
    17     9 2014   0.2   NA
    18     9 2014   0.8   NA
    19    10 2000   0.5 -0.3
    20    10 2000   0.1   NA
    21    11 2000   0.2   NA
    22    11 2001   0.4  0.2
    23    11 2001   0.1  0.6
    24    12 2001   0.6   NA
    25    12 2001   0.7   NA
    26    12 2014   0.3 -0.2