Search code examples
rmatrixtime-seriespercentage

Time series in R: How do I calculate percent change from a fixed year for multiple time series variables in R?


For multiple time series variables, how do I calculate the percent change over time relative to a fixed year?

structure(list(haiarYear = 2009:2012, 
               anyInf = c(25914L, 23601L, 22713L, 22654L), 
               haiarPatDays = c(10402161L, 10289079L, 10212208L, 10033090L), 
               rate = c(2.49121312388839, 
                        2.29379131018432, 
                        2.22410276014746, 
                        2.25792851454537)), 
               .Names = c("haiarYear", "anyInf", "haiarPatDays", "rate"), 
               row.names = c(NA, -4L), 
               class = "data.frame")
tsInfPatDays <- ts(tInfPatDays[,-1], start=2009)
options(digits=2)

Produces a time-series structure that looks like this:

Time Series:
Start = 2009 
End = 2012 
Frequency = 1 
     anyInf haiarPatDays rate
2009  25914     10402161 2.49
2010  23601     10289079 2.29
2011  22713     10212208 2.22
2012  22654     10033090 2.26

I want to calculate the percent change relative to 2009 for each of the variables: anyInf, haiarPatDays and rate.

For one variable, I can calculate percent change as:

transform(tsInfPatDays, since2009 = (rate-rate[1])/rate[1]*100)

Yielding:

anyInf haiarPatDays rate since2009
 25914     10402161 2.49      0.00
 23601     10289079 2.29     -7.92
 22713     10212208 2.22    -10.72
 22654     10033090 2.26     -9.36

The following calculates percent change relative to the previous year and operates on each variable:

100*(tsInfPatDays/lag(tsInfPatDays, -1)-1)

Giving:

Time Series:
Start = 2010 
End = 2012 
Frequency = 1 
     tsInfPatDays.anyInf tsInfPatDays.haiarPatDays tsInfPatDays.rate
2010               -8.93                    -1.087             -7.92
2011               -3.76                    -0.747             -3.04
2012               -0.26                    -1.754              1.52

Using this as a model, I expected to be able to perform the calculation by I needed by indexing the 2009 reference data tsInfPatDays[1,]

  anyInf haiarPatDays         rate 
2.59e+04     1.04e+07     2.49e+00

Then:

(tsInfPatDays-tsInfPatDays[1,])/tsInfPatDays[1,]*100

The first row appears to be calculated properly, however subsequent rows are clearly wrong.

I have seen a transposed matrix approach for row subtraction. Although not a percentage, as a proof of concept, I tried subtracting the values of the reference row from the time series rows. I got the following error:

t(tsInfPatDays-t(tsInfPatDays[1,]))

Error in `-.default`(tsInfPatDays, t(tsInfPatDays[1, ])) : 
  non-conformable arrays

I get the same error if I try to extract the data before using the transposed matrix approach:

 t(tsInfPatDays-t(drop(coredata(tsInfPatDays[1,]))))

 Error in `-.default`(tsInfPatDays, t(drop(coredata(tsInfPatDays[1, ])))) : 
 non-conformable arrays

Solution

  • You can loop over columns:

     ts(sapply(tsInfPatDays,function(x)(x-x[1])/x[1]*100), start= 2009)
    Time Series:
    Start = 2009 
    End = 2012 
    Frequency = 1 
             anyInf haiarPatDays       rate
    2009   0.000000     0.000000   0.000000
    2010  -8.925677    -1.087101  -7.924726
    2011 -12.352396    -1.826092 -10.722100
    2012 -12.580073    -3.548022  -9.364298