Search code examples
rtime-seriesforecasting

R error on timeseries


I have a script like below

visit.total[with(visit.total, order(year, month)), ]

which produce data frame like this

   year month visits
1  2013     1 342145
3  2013     2 273182
5  2013     3 257748
7  2013     4 210831
9  2013     5 221381
11 2013     6 207591
13 2013     7 205367
15 2013     8 145731
17 2013     9 109211
19 2013    10  65376
21 2013    11  64409
23 2013    12  58557
2  2014     1  65307
4  2014     2  36134
6  2014     3  79041
8  2014     4 110980
10 2014     5 107926
12 2014     6  79518
14 2014     7  98927
16 2014     8 113064
18 2014     9  60171
20 2014    10  43687
22 2014    11  47601
24 2014    12  47296

and when I run this code :

visit.total <- aggregate(data$visits,by=list(year=data$year,month=data$month), FUN=sum) #aggregate total visit 
colnames(visit.total)[3] <- "visits"
total.visit.ts <- ts(visit.total$visits, start=c(2013,1),frequency = 12)
total.visit.ts

it gives me result like below :

        Jan   Feb   Mar    Apr    May    Jun    Jul    Aug    Sep    Oct    Nov    Dec
2013 342145  65307 273182  36134 257748  79041 210831 110980 221381 107926 207591  79518
2014 205367  98927 145731 113064 109211  60171  65376  43687  64409  47601  58557  47296

Why my data is different from the first time after I do timeseries function? Please advice


Solution

  • It's hard to tell without more information about what you're trying to do, but I would guess based on your code that you want to get a time-series of the monthly attendance over 2013 and 2014. What's going on with your code is that R is possibly arranging your data based on the row numbers of your dataframe. Notice in your time series that Jan 2013 data is correct, but Feb 2013 data is actually data from Jan 2014. What's happening is that the time series is reading in the order of the row number (see the left-most column, where 01/2013 is #1, and 01/2014 is #2.

    This code, where I reproduced your data frame, should work:

    year <- as.numeric(c(2013, 2014))
    month <- as.numeric(c(1:12))
    visits <- as.numeric(c(342145, 273182, 257748, 210831, 221381, 207591, 205367, 145731, 109211, 65376, 64409, 58557,
                       65307, 36134, 79041, 110980, 107926, 79518, 98927, 113064, 60171, 43687, 47601, 47296))
    visit.total <- merge(year, month)
    colnames(visit.total) <- c("year", "month")
    visit.total <- visit.total[order(visit.total$year, visit.total$month), ]
    visit.total <- cbind(visit.total, visits)
    visit.total.ts <- ts(visit.total$visits, start = c(2013, 1), end = c(2014, 12), frequency = 12)
    

    You should see that the monthly visits are arranged properly by month and year.