Search code examples
rmachine-learningtime-seriessensorskalman-filter

R; Time series analysis on sensor data


I have a data frame of sensor data

I have a data frame as follows:

pressure    datetime
4.848374    2016-04-12 10:04:00   
4.683901    2016-04-12 10:04:32   
5.237860    2016-04-12 10:13:20 

Now, I would like to apply ARIMA to make predictive analytics.

Since the data is not sampled uniformly, I have aggregated it on Hourly basis which looks as follows:

datetime                    pressure
"2016-04-19 00:00:00 BST"   5.581806
"2016-04-19 01:00:00 BST"   4.769832
"2016-04-19 02:00:00 BST"   4.769832  
"2016-04-19 03:00:00 BST"   4.553711  
"2016-04-19 04:00:00 BST"   6.285599  
"2016-04-19 05:00:00 BST"   5.873414

The pressure for every hour looks like below:

enter image description here

But I can't create ts object as I am not sure what the frequency should be for Hourly data.


Solution

  • Your question has already been answered in the comment section, but just to reiterate, you should set the frequency to 24, as you want to forecast the hourly data:

    sensor = ts(hourlyPressure, frequency = 24)
    

    To your next point with regards to fixing the dates in your plot, lets start with some example data:

    ###Sequence of numbers to forecast    
    hourlyPressure<-c(1:24, 12:36, 24:48, 36:60)
    ###Sequence of Accompanying Dates
    dates<-seq(as.POSIXct("2016-04-19 00:00:00"), as.POSIXct("2016-04-23 02:00:00"), by="hour")
    

    Now we can set the hourlyPressure data to be a ts() object (let's ignore the dates for a minute)

    sensor <- ts(hourlyPressure, frequency=24)
    

    Now fit your arima model, in this example I will use the auto.arima function from the forecast package as finding the best arima model is not the focus of attention here (although using auto.arima() is a pretty robust way of finding the best arima model to fit your data):

    ###fit arima model to sensor data
    sensor_arima_fit<- auto.arima(sensor)
    

    You can then plot this data with the appropriate dates by just specifying the x value in the plot() function

    plot(y=sensor_arima_fit$x, x=dates)
    

    A little more difficult is when we forecast our data and want to plot the original data, with the forecasts and have the dates correct.

    ###now forecast ahead (lets say 2 days) using the arima model that was fit above
    forecast_sensor <- forecast(sensor_arima_fit, h = 48)
    

    Now to plot the original data, forecasts with the correct dates, we can do the following:

    ###set h to be the same as above
    h <- c(48)
    ###calculate the dates for the forecasted values
    forecasted_dates<-seq(dates[length(dates)]+(60*60)*(1), 
                      dates[length(dates)]+(60*60)*(h), by="hour")
    
    ###now plot the data
    plot(y=c(forecast_sensor$x, forecast_sensor$mean), 
         x=seq(as.POSIXct("2016-04-19 00:00:00"),as.POSIXct(forecasted_dates[length(forecasted_dates)]), by="hour"),
         xaxt="n", 
         type="l", 
         main="Plot of Original Series and Forecasts", 
         xlab="Date", 
         ylab="Pressure")
    
    ###correctly formatted x axis
    axis.POSIXct(1, at=seq(as.POSIXct("2016-04-19 00:00:00"), 
                           as.POSIXct(forecasted_dates[length(forecasted_dates)]), 
                           by="hour"), 
                 format="%b %d", 
                 tick = FALSE)
    

    This plots the original data with the forecasts and the dates are correct. However, just like the forecast package provides, perhaps we want the forecasts to be in blue.

    ###keep same plot as before
    plot(y=c(forecast_sensor$x, forecast_sensor$mean), 
         x=seq(as.POSIXct("2016-04-19 00:00:00"),as.POSIXct(forecasted_dates[length(forecasted_dates)]), by="hour"),
         xaxt="n", 
         type="l", 
         main="Plot of Original Series and Forecasts", 
         xlab="Date", 
         ylab="Pressure")
    
    axis.POSIXct(1, at=seq(as.POSIXct("2016-04-19 00:00:00"), 
                           as.POSIXct(forecasted_dates[length(forecasted_dates)]), 
                           by="hour"), 
                 format="%b %d", 
                 tick = FALSE)
    
    ###This time however, lets add a different color line for the forecasts
    lines(y=forecast_sensor$mean, x= forecasted_dates, col="blue")