Search code examples
rtime-seriesforecastingmoving-average

R - time series hourly


I have the following dataset of incoming calls per day within the hours from 3 p.m. to 10 p.m. which looks like this:

Date        hour  Count  Year  Month  Day
01.01.2001  15    69     2001  1      1
01.01.2001  16    12     2001  1      1
01.01.2001  17    56     2001  1      1
01.01.2001  18    34     2001  1      1
01.01.2001  19    44     2001  1      1
01.01.2001  20    91     2001  1      1
01.01.2001  21    82     2001  1      1
01.01.2001  22    49     2001  1      1
...
17.08.2003  22    103    2003  8      17

what needs to be done is a time series analysis including forecasts, exponential smoothing, moving average and so forth.

the problem that I'm facing now is how to declare the ts function? I only have the peak hours from 3 p.m to 10 p.m available, so I can't declare the frequency as 24.

Can anybody help me out?

many thanks cheers,


Solution

  • 1) Assuming that the series starts at 3pm, that days are consecutive and all hours from 3pm to 10pm are present:

    tser <- ts(DF[-1], freq = 8)
    

    giving:

    > tser
    Time Series:
    Start = c(1, 1) 
    End = c(1, 8) 
    Frequency = 8 
          hour Count Year Month Day
    1.000   15    69 2001     1   1
    1.125   16    12 2001     1   1
    1.250   17    56 2001     1   1
    1.375   18    34 2001     1   1
    1.500   19    44 2001     1   1
    1.625   20    91 2001     1   1
    1.750   21    82 2001     1   1
    1.875   22    49 2001     1   1
    

    This will represent the index for day 1 3pm as 1.0, day 1 4pm as 1+1/8, day 1 5pm as 1+2/8, ..., day1 10pm as 1+7/8, day 2 3pm as 2, day 2 4pm as 2+1/8, etc.

    2) This is the same but the days start at the number of days since 1970-01-01 instead of starting at 1:

    tser <- ts(DF[-1], start = as.Date("2001-01-01"), freq = 8)
    

    giving:

    > tser
    Time Series:
    Start = c(11323, 1) 
    End = c(11323, 8) 
    Frequency = 8 
             hour Count Year Month Day
    11323.00   15    69 2001     1   1
    11323.12   16    12 2001     1   1
    11323.25   17    56 2001     1   1
    11323.38   18    34 2001     1   1
    11323.50   19    44 2001     1   1
    11323.62   20    91 2001     1   1
    11323.75   21    82 2001     1   1
    11323.88   22    49 2001     1   1
    

    That is, this would represent each day as the number of days since 1970-01-01 plus, as before, 0, 1/8, ..., 7/8 for the hours.

    If you later need to regenerate the date/time then:

    library(chron)
    tt <- as.numeric(time(tser))
    as.chron(tt %/% 1) + (8 * tt%%1 + 15)/24
    

    giving:

    [1] (01/01/01 15:00:00) (01/01/01 16:00:00) (01/01/01 17:00:00)
    [4] (01/01/01 18:00:00) (01/01/01 19:00:00) (01/01/01 20:00:00)
    [7] (01/01/01 21:00:00) (01/01/01 22:00:00)
    

    3) zoo If its not important to keep them equally spaced then you could try this:

    library(zoo)
    library(chron)
    z <- zoo(DF[-1], as.chron(format(DF$Date), "%d.%m.%Y") + DF$hour/24)
    

    giving:

    > z
                        hour Count Year Month Day
    (01/01/01 15:00:00)   15    69 2001     1   1
    (01/01/01 16:00:00)   16    12 2001     1   1
    (01/01/01 17:00:00)   17    56 2001     1   1
    (01/01/01 18:00:00)   18    34 2001     1   1
    (01/01/01 19:00:00)   19    44 2001     1   1
    (01/01/01 20:00:00)   20    91 2001     1   1
    (01/01/01 21:00:00)   21    82 2001     1   1
    (01/01/01 22:00:00)   22    49 2001     1   1
    

    The zoo approach does not require that all hours be present nor is it required that the days be consecutive.

    Note: I am not sure that you really need all the date and hour fields broken out separately since they can easily be generated on the fly so this might be enough.

    Count <- z$Count
    

    Year can be recovered via as.numeric(format(time(Count), "%Y")) and month, day and hour can be recovered by using %m, %d or %H in place of %Y.

    A list of the month, day and year columns can also be generated using month.day.year(time(Count)).

    years(time(Count)), months(time(Count)), days(time(Count)) and hours(time(Count)) will produce factors of the indicated quantities.