Search code examples
rregressionnon-linear-regression

How to find a curve that fits a series of points on the R?


I need to find out the equation of the power curve that adjusts to the contaminated per day of a certain disease so that I can make a prediction, the data follows:

Day     Contaminated

26/feb  1
29/feb  2
04/mar  3
05/mar  8
06/mar  13
07/mar  19
08/mar  25
10/mar  34
11/mar  52
12/mar  81
13/mar  98
14/mar  121
15/mar  176
16/mar  234
17/mar  291
18/mar  428
19/mar  621
20/mar  904
21/mar  1128
22/mar  1546
23/mar  1891
24/mar  2201
25/mar  2433

I believe that I need to do a power curve regression(NonLinearRegression) in R, but I don't know how to implement it.


Solution

  • Here is an approach using data.table, ggplot2 and nls.

    First, let's fix the dates into the standard format and convert to integers so we can do some calculations.

    library(data.table)
    library(ggplot2)
    setDT(data)
    data[,Day:= as.Date(Day,"%d/%b")]
    data[,Int := as.integer(Day)-min(as.integer(Day))]
    

    Then we use nls to fit a model to the data. We'll use the formula y = a * x ^ b.

    nls(formula = Contaminated ~ a * Int ^ b, data,start=list(a=1,b=1))
    # Nonlinear regression model
    #  model: Contaminated ~ a * Int^b
    #   data: data
    #        a         b 
    #2.272e-05 5.571e+00 
    # residual sum-of-squares: 123279
    #
    #Number of iterations to convergence: 48 
    #Achieved convergence tolerance: 7.832e-07
    

    Now we can view the results with ggplot.

    ggplot(data, aes(x=Int,y=Contaminated)) + 
      geom_point() +
      scale_x_continuous(breaks = c(0,10,20), labels = data$Day[data$Int %in% c(0,10,20)]) + xlab("Date") +
      geom_smooth(method="nls", formula = y ~ a * x ^ b,method.args = list(start = c(a=1, b=1)),se=FALSE, linetype = 1)
    

    enter image description here Data

    data <- structure(list(Day = c("26/feb", "29/feb", "04/mar", "05/mar", 
    "06/mar", "07/mar", "08/mar", "10/mar", "11/mar", "12/mar", "13/mar", 
    "14/mar", "15/mar", "16/mar", "17/mar", "18/mar", "19/mar", "20/mar", 
    "21/mar", "22/mar", "23/mar", "24/mar", "25/mar"), Contaminated = c(1L, 
    2L, 3L, 8L, 13L, 19L, 25L, 34L, 52L, 81L, 98L, 121L, 176L, 234L, 
    291L, 428L, 621L, 904L, 1128L, 1546L, 1891L, 2201L, 2433L)), class = "data.frame", row.names = c(NA, 
    -23L))