Search code examples
rtime-seriesdivision

How to calculate a division with lag? x[-1] / x[-length(x)]


how can I calculate a division with lag 12?

Monthly data example:

  • 2000-01: 179.300
  • 2000-02: 179.400
  • 2000-03: 180.000
  • 2000-04: 180.300
  • 2000-05: 180.700
  • 2000-06: 181.100
  • 2000-07: 181.500
  • 2000-08: 181.900
  • 2000-09: 182.300
  • 2000-10: 182.600
  • 2000-11: 183.100
  • 2000-12: 183.300
  • 2001-01: 183.900
  • 2001-02: 184.400
  • 2001-03: 184.700

Now I would lilke divide:

  • Value January 2001 (183.900) / Value January 2000 (179.300)
  • February 2001 (184.400) / February 2000 (179.400)
  • For march: 184.700/180.000

I tried:

y<- x[-1] / x[-length(x)]

But this is a normal division for each value who is before the other one without lag.

Can I modify the formula with gap 12?

Thank you for helping.

Best regards


Solution

  • We will assume the input data frame DF shown in the Note at the end and that the output required is to add a new column.

    Before we start note that lag in base R does not work on plain vectors. It requires a time series. For example

    # wrong
    with(DF, V2 / lag(V2, -12))
    ##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    ## attr(,"tsp")
    ## [1] 13 27  1
    

    1) zoo Convert DF to a zoo series and use diff.zoo with the indicated arguments. diff.zoo supports lag= and arith= arguments. If the latter is FALSE it uses division rather than subtraction for the diff.

    library(zoo)
    
    z <- read.zoo(DF, FUN = as.yearmon)
    zz <- cbind(z, diff12 = diff(z, lag = 12, arith = FALSE))
    zz
    

    giving this zoo object (use fortify.zoo on it if you want a data frame as the result).

                 z   diff12
    Jan 2000 179.3       NA
    Feb 2000 179.4       NA
    Mar 2000 180.0       NA
    Apr 2000 180.3       NA
    May 2000 180.7       NA
    Jun 2000 181.1       NA
    Jul 2000 181.5       NA
    Aug 2000 181.9       NA
    Sep 2000 182.3       NA
    Oct 2000 182.6       NA
    Nov 2000 183.1       NA
    Dec 2000 183.3       NA
    Jan 2001 183.9 1.025655
    Feb 2001 184.4 1.027871
    Mar 2001 184.7 1.026111
    

    2) ts Another possibility is to use a ts series. This only uses base R.

    tt <- ts(DF$V2, start = c(2000, 1), frequency = 12)
    cbind(tt, diff12 = tt / lag(tt, -12))
    

    The result is this ts series:

                tt   diff12
    Jan 2000 179.3       NA
    Feb 2000 179.4       NA
    Mar 2000 180.0       NA
    Apr 2000 180.3       NA
    May 2000 180.7       NA
    Jun 2000 181.1       NA
    Jul 2000 181.5       NA
    Aug 2000 181.9       NA
    Sep 2000 182.3       NA
    Oct 2000 182.6       NA
    Nov 2000 183.1       NA
    Dec 2000 183.3       NA
    Jan 2001 183.9 1.025655
    Feb 2001 184.4 1.027871
    Mar 2001 184.7 1.026111
    

    3) collapse The fdiff function in the collapse package can be used with the log= argument. Note that exp(log(x) - log(y)) equals x/y.

    library(collapse)
    ftransform(DF, diff12 = exp(fdiff(DF$V2, 12, log = TRUE)))
    

    or

    ftransform(DF, diff12 = V2 / flag(V2, 12))
    

    4) dplyr The lag function in dplyr can be used; however, be very careful using it because it clobbers base R's lag and will result in numerous packages not working. Safest is to exclude it when loading dplyr and then use dplyr::lag . Also be aware that the n= argument of dplyr's lag has the opposite sign of the lag function in base R creating even more confusion.

    library(dplyr, exclude = c("lag", "filter"))
    
    DF %>% mutate(diff12 = V2 / dplyr::lag(V2, 12))
    

    5) Base R The ts solution above only uses base R but here is another base R solution.

    transform(DF, diff12 = c(rep(NA, 12), tail(V2, -12) / head(V2, -12)))
    

    Note

    Lines <- "2000-01: 179.300
    2000-02: 179.400
    2000-03: 180.000
    2000-04: 180.300
    2000-05: 180.700
    2000-06: 181.100
    2000-07: 181.500
    2000-08: 181.900
    2000-09: 182.300
    2000-10: 182.600
    2000-11: 183.100
    2000-12: 183.300
    2001-01: 183.900
    2001-02: 184.400
    2001-03: 184.700"
    
    DF <- read.table(text = Lines, sep = ":")