Search code examples
rtrigonometrymodelinglm

fitting a sine curve R


The values list below contains annual indices of sunspot activity from 1900-1923. I need to fit a sin curve to this data with a period of 10.4 years, with lm() in R. Despite reading other related posts, I couldn't figure this out yet. The dependent will be the observed values, but for the regressor sin() I am not sure how to set the period as 10.4 years. Could anyone give me a hint?

values <- c(113.5,32.9, 60.3,92.6,503.4,761.6,646.3,744.4, 582.5,526.6,223.0, 68.4, 43.1,17.3, 115.1,568.4,684.8, 1246.7,966.9,763.3,
451.7,313.6,170.9,69.3)

Solution

  • Linear vs. nonlinear models

    lm() is made for models that are linear in parameters, which is not the case of sine models.
    Such models can be estimate with nls(), which is made for nonlinear models.

    Modeling

    The general form of the model would be: y = a + b*sin(c + d*x)

    With your hypothesis d would be around 2*pi/10.4, and we can set a and b to the average y value and c to 0.

    This results in the following expected fit:

    library(tidyverse)
    
    df <- tibble(year = 1900:1923,
                 values = values) %>%
      mutate(expected_fit = mean(values) + sin(year * (2*pi/10.4)) * mean(values))
    
    ggplot() + 
      geom_point(data = df, aes(x = year, y = values), color = "blue") +
      geom_line(data = df, aes(x = year, y = expected_fit), color = "red")
    

    enter image description here

    Estimation

    We can now use nls() to fit this model, specifying our hypothesized parameters for the function to start there.

    res <- summary(nls(values ~ a + b*sin(c + d*year), data = df, 
                       start = list(a = mean(df$values), b = mean(df$values), 
                                    c = 0, d = 2*pi/10.4)))$coefficients
    
    res
    
        Estimate  Std. Error   t value     Pr(>|t|)
    a 442.371777 27.12392790 16.309282 5.093534e-13
    b 439.541330 35.88116525 12.249918 9.437063e-11
    c  54.939557 26.43198489  2.078526 5.074681e-02
    d   0.575082  0.01382716 41.590751 6.740719e-21
    

    Visual check

    We can now check that nls converged towards appropriate coefficients by computing and plotting the fitted values from this model.

    df <- df %>%
      mutate(fit = res["a", 1]+res["b", 1]*sin(res["c", 1]+res["d", 1]*year))
    
    ggplot() + 
      geom_point(data = df, aes(x = year, y = values), color = "blue") +
      geom_line(data = df, aes(x = year, y = fit), color = "red")
    

    enter image description here