fitting a sine curve R

The values list below contains annual indices of sunspot activity from 1900-1923. I need to fit a sin curve to this data with a period of 10.4 years, with lm() in R. Despite reading other related posts, I couldn't figure this out yet. The dependent will be the observed values, but for the regressor sin() I am not sure how to set the period as 10.4 years. Could anyone give me a hint?

values <- c(113.5,32.9, 60.3,92.6,503.4,761.6,646.3,744.4, 582.5,526.6,223.0, 68.4, 43.1,17.3, 115.1,568.4,684.8, 1246.7,966.9,763.3,
451.7,313.6,170.9,69.3)

Solution

Linear vs. nonlinear models

lm() is made for models that are linear in parameters, which is not the case of sine models.
Such models can be estimate with nls(), which is made for nonlinear models.

Modeling

The general form of the model would be: y = a + b*sin(c + d*x)

With your hypothesis d would be around 2*pi/10.4, and we can set a and b to the average y value and c to 0.

This results in the following expected fit:

library(tidyverse)

df <- tibble(year = 1900:1923,
             values = values) %>%
  mutate(expected_fit = mean(values) + sin(year * (2*pi/10.4)) * mean(values))

ggplot() + 
  geom_point(data = df, aes(x = year, y = values), color = "blue") +
  geom_line(data = df, aes(x = year, y = expected_fit), color = "red")

Estimation

We can now use nls() to fit this model, specifying our hypothesized parameters for the function to start there.

res <- summary(nls(values ~ a + b*sin(c + d*year), data = df, 
                   start = list(a = mean(df$values), b = mean(df$values), 
                                c = 0, d = 2*pi/10.4)))$coefficients

res

    Estimate  Std. Error   t value     Pr(>|t|)
a 442.371777 27.12392790 16.309282 5.093534e-13
b 439.541330 35.88116525 12.249918 9.437063e-11
c  54.939557 26.43198489  2.078526 5.074681e-02
d   0.575082  0.01382716 41.590751 6.740719e-21

Visual check

We can now check that nls converged towards appropriate coefficients by computing and plotting the fitted values from this model.

df <- df %>%
  mutate(fit = res["a", 1]+res["b", 1]*sin(res["c", 1]+res["d", 1]*year))

ggplot() + 
  geom_point(data = df, aes(x = year, y = values), color = "blue") +
  geom_line(data = df, aes(x = year, y = fit), color = "red")