Search code examples
rggplot2regressionposixctnls

Fitting a sine wave model on POSIXt data and plotting using Ggplot2


Long-time reader, first-time asker here :)

I have some data collected at specific times and dates, and there is reason to hypothesize the data roughly follows a 24-hour cycle. I would like to fit a sine wave model on my data as a function of time, so that it is possible to test if future data points fall on the predicted pattern.

I have read this, this and this response but they are not solving my problem because in my case, I'm hoping to keep the x-axis data in POSIXct date-time format. That's how the data is collected and using this format makes for an easily interpreted plot.

Here's some reproducible data that is identical to my real data:

time <- c("2022-01-01 09:20:00", "2022-01-02 11:10:00", 
          "2022-01-02 18:37:00", "2022-01-03 14:01:00", 
          "2022-01-05 06:50:00", "2022-01-06 17:03:00")

time <- as.POSIXct(time)

value <- c(3, 6, 2, 8, 4, 1)

These are plotted fine in base R:

plot(time, value)

However, now I run into trouble when I try to construct a sine regression model that would fit the time series. I'm also struggling to fully understand the parameters required by nls function. Based on the previous examples, I have tried this approach (with comments on how I understand it working):

res <- nls(value ~ A * sin(omega * time + phi) + C,        # This is the basic sine-function format
           data = data.frame(time, value),                 # This defines the data used
           start = list(A = 1, omega = 1, phi = 1, C = 1)) # This gives nls the starting values?

Here, I get an error message: "Error in Ops.POSIXt(omega, time) : '*' not defined for "POSIXt" objects" which I interpret as meaning the specific date format I would like to use is not acceptable for this type of approach. I know this, because if I simply replace the time variable with a dummy vector of integers, the model works fine and I'm able to plot it as follows:

time2 <- c(1, 2, 3, 4, 5, 6)
res <- nls(value ~ A * sin(omega * time2 + phi) + C,
       data = data.frame(time, value),
       start=list(A=1, omega=1, phi=1, C=1))

coefs <- coef(res)

fit <- function(x, a, b, c, d) {a * sin(b * x + c) + d}

plot(time2, value)
curve(fit(x, a = coefs["A"], b = coefs["omega"], 
             c = coefs["phi"], d = coefs["C"]), add=TRUE, 
             lwd=2, col="red")

I know I'm on the right track but my main question is, how can I do the above process while maintaining the time variable in POSIXct format?

As mentioned, my main order of business would be to plot the data using Ggplot2, but I can't even begin to try that before I solve the initial problem. However, any pointers on how to get started with that are greatly appreciated! :)


Solution

  • I would probably just generate a numeric number of days from an arbitrary origin time and use that. You can then modify your fit function so that it converts date-times to predicted values. You can then easily make a data frame of predictions from your model and plot that.

    df <- data.frame(time = time, value = value)
    
    origin <- as.POSIXct("2022-01-01 00:00:00")
    
    df$days <- as.numeric(difftime(time, origin, unit = "day"))
    
    res <- nls(value ~ A * sin(omega * days + phi) + C,  
               data = df, 
               start = list(A = 1, omega = 1, phi = 1, C = 1))
    
    fit <- function(res, newdata) {
      
      x <- as.numeric(difftime(origin, newdata$time, units = "days"))
      C <- as.list(coef(res))
      C$A * sin(C$omega * x + C$phi) + C$C
    }
    
    new_df <- data.frame(time = origin + as.difftime(new_times, units = "days"))
    new_df$value <- fit(res, new_df)
    
    ggplot(df, aes(time, value)) +
      geom_point() +
      geom_line(data = new_df, colour = "gray") +
      theme_bw()
    

    enter image description here