Search code examples
rtime-seriesfrequency

ts() frequency for a yearly data series of 30 min frequency observations


I want to create a ts() object from a dataframe for forecasting physical phenomena.

My data has a 30 min frequency over a period of 1 year (1-1-2018 to 12-31-2018) also I have observed that my data has a seasonality of 1 day.

> head(pleiadesGH.v2[,c("time", "humExt.R", "tempExt", "radExt", "vientoVelo")])
                 time humExt.R  tempExt    radExt vientoVelo
1 2018-01-01 00:00:00       NA       NA        NA         NA
2 2018-01-01 00:30:00 36.78287 16.95125 -10.08125    3.68550
3 2018-01-01 01:00:00 38.56775 16.26350  -9.75000    2.38420
4 2018-01-01 01:30:00 38.76425 15.63470 -10.08125    2.71915
5 2018-01-01 02:00:00 39.61575 15.32030 -10.41250    3.70475
6 2018-01-01 02:30:00 37.48700 15.06485 -10.74375    2.51895

Based on this answers :

https://robjhyndman.com/hyndsight/seasonal-periods/

time series with 10 min frequency in R

I conclude that my ts() frequency should be of 48, because of 1 day have 48 observations.

ts.freq1 <- ts(data = pleiadesGH.v2[,2:ncol(pleiadesGH.v2)],
           start = c(2018),
           frequency = 48)

But the resulting ts() has a wrong time index as you can see below. The time data should be from 2018 to 2019 instead of 2400.

Time Series:
Start = c(2018, 1) 
End = c(2383, 1) 
Frequency = 48 
          humInt.R    humInt.E  tempInt   tempMac  humExt.R    humExt.E     radExt  tempExt vientoVelo
2018.000        NA          NA       NA        NA        NA          NA         NA       NA         NA
2018.021        NA          NA       NA        NA  36.78287 0.004410894  -10.08125 16.95125  3.6855000
2018.042        NA          NA       NA        NA  38.56775 0.004427114   -9.75000 16.26350  2.3842000
2018.062        NA          NA       NA        NA  38.76425 0.004273306  -10.08125 15.63470  2.7191500
2018.083        NA          NA       NA        NA  39.61575 0.004280005  -10.41250 15.32030  3.7047500
2018.104        NA          NA       NA        NA  37.48700 0.003982139  -10.74375 15.06485  2.5189500
2018.125        NA          NA       NA        NA  35.84950 0.003735063  -10.41250 14.77010  3.2235000
2018.146        NA          NA       NA        NA  36.68462 0.003697674   -8.75625 14.25920  1.4409500
2018.167        NA          NA       NA        NA  41.48250 0.003954404  -11.07500 13.39460  1.5064000
2018.188        NA          NA       NA        NA  42.54688 0.003968433   -9.41875 13.06055  3.6701000
2018.208        NA          NA       NA        NA  43.05450 0.003969581   -9.08750 12.88370  1.6103500
2018.229        NA          NA       NA        NA  44.11888 0.004000366   -9.41875 12.62825  1.3485500
2018.250        NA          NA       NA        NA  46.26400 0.004061953   -9.08750 12.13700  1.9491500
2018.271        NA          NA       NA        NA  46.88625 0.004084874   -9.08750 12.01910  2.0569500
2018.292        NA          NA       NA        NA  49.57175 0.004187059   

wrong plot due to time index

I also tried with this frequency:

ts.freq1 <- ts(data = pleiadesGH.v2[,2:ncol(pleiadesGH.v2)],
           start = c(2018),
           frequency =  365.25*24*60/30 )

Getting the following result :

Time Series:
Start = c(2018, 1) 
End = c(2018, 17521) 
Frequency = 17532 
          humInt.R    humInt.E  tempInt   tempMac  humExt.R    humExt.E     radExt  tempExt vientoVelo
2018.000        NA          NA       NA        NA        NA          NA         NA       NA         NA
2018.000        NA          NA       NA        NA  36.78287 0.004410894  -10.08125 16.95125  3.6855000
2018.000        NA          NA       NA        NA  38.56775 0.004427114   -9.75000 16.26350  2.3842000
2018.000        NA          NA       NA        NA  38.76425 0.004273306  -10.08125 15.63470  2.7191500
2018.000        NA          NA       NA        NA  39.61575 0.004280005  -10.41250 15.32030  3.7047500
2018.000        NA          NA       NA        NA  37.48700 0.003982139  -10.74375 15.06485  2.5189500
2018.000        NA          NA       NA        NA  35.84950 0.003735063  -10.41250 14.77010  3.2235000
2018.000        NA          NA       NA        NA  36.68462 0.003697674   -8.75625 14.25920  1.4409500
2018.000        NA          NA       NA        NA  41.48250 0.003954404  -11.07500 13.39460  1.5064000
2018.001        NA          NA       NA        NA  42.54688 0.003968433   -9.41875 13.06055  3.6701000
2018.001        NA          NA       NA        NA  43.05450 0.003969581   -9.08750 12.88370  1.6103500
2018.001        NA          NA       NA        NA  44.11888 0.004000366   -9.41875 12.62825  1.3485500
2018.001        NA          NA       NA        NA  46.26400 0.004061953   -9.08750 12.13700  1.9491500

But this implicitly means that my seasonality is yearly but this is not my objective. In the following picture you can se the time index is now fixed despite the wrong seasonality

good index incorrect seasonality

What im doing wrong?


Solution

  • The solution is as follows:

    freq.daily <- 48 # 24 hours *  2 obs per hour
    
    ts.daily <- ts(data = pleiadesGH.v2.interp[,2:ncol(pleiadesGH.v2)],
               start = c(1),
               frequency = freq.daily)
    
    Time Series:
    Start = c(1, 1) 
    End = c(366, 1) 
    Frequency = 48 
                humInt.R    humInt.E  tempInt   tempMac  humExt.R    humExt.E      radExt
      1.000000  74.56250 0.007699896 14.53500 13.625000  36.78287 0.004410894  -10.081250
      1.020833  74.56250 0.007699896 14.53500 13.625000  36.78287 0.004410894  -10.081250
      1.041667  74.56250 0.007699896 14.53500 13.625000  38.56775 0.004427114   -9.750000
      1.062500  74.56250 0.007699896 14.53500 13.625000  38.76425 0.004273306  -10.081250
      1.083333  74.56250 0.007699896 14.53500 13.625000  39.61575 0.004280005  -10.412500
      1.104167  74.56250 0.007699896 14.53500 13.625000  37.48700 0.003982139  -10.743750
    

    As this is the way ts simply and effectively manage dates, starting from 1.