Search code examples
rtime-series

Difference for time-series data


I am working on a time series analysis. In order to test the data stationarity, I want to get the first difference from the data. This data has 22 feature columns with numerous rows. The head of data(data.frame object) is below: enter image description here

My question is, I am trying to work out the first and second order of Difference (not lag) for each column (the second and third columns should not be used) to throw them into the stationarity test. My attempt is like this:

diff1 <- c()

for (i in 1:length(city)) {
  diff1[i] <- paste(city[i], "diff1", sep=".") 
}

for (x in 1:length(city)) {
  assign(paste(df[x]), diff((eval(parse(text=city[x])))[, 1]))
}

I think it should not be so complicated, and the above codes did not work out the difference for each city column. How should I do that? Thanks.

Edit: The codes above will generate a bunch of single vectors, as passing a univariate vector is required by ACF and other time series functions. However, for just purpose of the difference, we don't have to do so. Just convert the data set to timeseries data by using the tseries package. Then, call diff(x,difference=n), where n is the difference period required. It will not generate any missing value. (ref: https://atsa-es.github.io/atsa-labs/sec-tslab-differencing-to-remove-a-trend-or-seasonal-effects.html) The answers below did provide some help. However, in time series analysis, either appending 0 or having NA is not acceptable. For the purpose of acceptance, I would give credit to the first respondent who provided the most structural answers. For others who confront the same issue, use the approach I edited above.


Solution

  • The tsibble package can help compute differences. This example adds columns with the first difference for three features.

    library(dplyr)
    
    set.seed(123)
    df <- data.frame(
      date = c('2000-01-20', '2000-02-20', '2000-03-20', '2000-04-20', '2000-05-20', '2000-06-20'),
      Composite.20 = runif(6, 100, 120),
      Nation.US = runif(6, 100, 120),
      AZ.Phoenix = runif(6, 100, 120)
    )
    
    df
    #>         date Composite.20 Nation.US AZ.Phoenix
    #> 1 2000-01-20     105.7516  110.5621   113.5514
    #> 2 2000-02-20     115.7661  117.8484   111.4527
    #> 3 2000-03-20     108.1795  111.0287   102.0585
    #> 4 2000-04-20     117.6603  109.1323   117.9965
    #> 5 2000-05-20     118.8093  119.1367   104.9218
    #> 6 2000-06-20     100.9111  109.0667   100.8412
    
    df |>
      mutate(across(-date, tsibble::difference, .names = '{.col}_first_diff'))
    #>         date Composite.20 Nation.US AZ.Phoenix Composite.20_first_diff
    #> 1 2000-01-20     105.7516  110.5621   113.5514                      NA
    #> 2 2000-02-20     115.7661  117.8484   111.4527               10.014552
    #> 3 2000-03-20     108.1795  111.0287   102.0585               -7.586564
    #> 4 2000-04-20     117.6603  109.1323   117.9965                9.480810
    #> 5 2000-05-20     118.8093  119.1367   104.9218                1.148998
    #> 6 2000-06-20     100.9111  109.0667   100.8412              -17.898216
    #>   Nation.US_first_diff AZ.Phoenix_first_diff
    #> 1                   NA                    NA
    #> 2             7.286271             -2.098745
    #> 3            -6.819681             -9.394174
    #> 4            -1.896406             15.938006
    #> 5            10.004372            -13.074745
    #> 6           -10.069984             -4.080564
    

    Created on 2024-04-06 with reprex v2.0.2