I am working on a time series analysis. In order to test the data stationarity, I want to get the first difference from the data. This data has 22 feature columns with numerous rows. The head of data(data.frame object) is below:
My question is, I am trying to work out the first and second order of Difference (not lag) for each column (the second and third columns should not be used) to throw them into the stationarity test. My attempt is like this:
diff1 <- c()
for (i in 1:length(city)) {
diff1[i] <- paste(city[i], "diff1", sep=".")
}
for (x in 1:length(city)) {
assign(paste(df[x]), diff((eval(parse(text=city[x])))[, 1]))
}
I think it should not be so complicated, and the above codes did not work out the difference for each city column. How should I do that? Thanks.
Edit: The codes above will generate a bunch of single vectors, as passing a univariate vector is required by ACF and other time series functions.
However, for just purpose of the difference, we don't have to do so. Just convert the data set to timeseries data by using the tseries
package. Then, call diff(x,difference=n)
, where n is the difference period required. It will not generate any missing value. (ref: https://atsa-es.github.io/atsa-labs/sec-tslab-differencing-to-remove-a-trend-or-seasonal-effects.html)
The answers below did provide some help. However, in time series analysis, either appending 0 or having NA is not acceptable. For the purpose of acceptance, I would give credit to the first respondent who provided the most structural answers. For others who confront the same issue, use the approach I edited above.
The tsibble
package can help compute differences. This example adds columns with the first difference for three features.
library(dplyr)
set.seed(123)
df <- data.frame(
date = c('2000-01-20', '2000-02-20', '2000-03-20', '2000-04-20', '2000-05-20', '2000-06-20'),
Composite.20 = runif(6, 100, 120),
Nation.US = runif(6, 100, 120),
AZ.Phoenix = runif(6, 100, 120)
)
df
#> date Composite.20 Nation.US AZ.Phoenix
#> 1 2000-01-20 105.7516 110.5621 113.5514
#> 2 2000-02-20 115.7661 117.8484 111.4527
#> 3 2000-03-20 108.1795 111.0287 102.0585
#> 4 2000-04-20 117.6603 109.1323 117.9965
#> 5 2000-05-20 118.8093 119.1367 104.9218
#> 6 2000-06-20 100.9111 109.0667 100.8412
df |>
mutate(across(-date, tsibble::difference, .names = '{.col}_first_diff'))
#> date Composite.20 Nation.US AZ.Phoenix Composite.20_first_diff
#> 1 2000-01-20 105.7516 110.5621 113.5514 NA
#> 2 2000-02-20 115.7661 117.8484 111.4527 10.014552
#> 3 2000-03-20 108.1795 111.0287 102.0585 -7.586564
#> 4 2000-04-20 117.6603 109.1323 117.9965 9.480810
#> 5 2000-05-20 118.8093 119.1367 104.9218 1.148998
#> 6 2000-06-20 100.9111 109.0667 100.8412 -17.898216
#> Nation.US_first_diff AZ.Phoenix_first_diff
#> 1 NA NA
#> 2 7.286271 -2.098745
#> 3 -6.819681 -9.394174
#> 4 -1.896406 15.938006
#> 5 10.004372 -13.074745
#> 6 -10.069984 -4.080564
Created on 2024-04-06 with reprex v2.0.2