Search code examples
rggplot2linear-regressioninterpolation

Get the fit of linear models between consecutive points in R


I have task I can't figure out: I have a set of values at different dates, and would like to have the values in between these dates by creating linear models between the points and extracting the fit. This would be useful as I have another dataset that need this value to be assigned depending on the date. After having the fit, these would be assigned using a rolling join (this part is already ok and working).

Here an example and what I tried:


dt1 <- read.table(text ="Date,Measure
2019-02-13 11:11:00,728.2172
2019-07-09 11:11:00,738.4000
2019-08-06 11:11:00,743.8530
2019-02-13 11:11:00,728.2100
2019-07-09 11:11:00,738.4000
2019-08-06 11:11:00,743.8500
2019-12-11 11:11:00,696.4650
2020-03-02 11:11:00,715.5200
2020-04-30 11:11:00,721.1650
2020-08-25 11:11:00,740.0000", header = T, sep=",")
str(dt1)
dt1$Date<-as.POSIXct(dt1$Date,origin = "1970-01-01", tz = "GMT")

p0<-ggplot(data=dt1, aes(x = Date, y = Measure))+ geom_point() +geom_line()+
  labs(x="Date",y="Values")+
  scale_x_datetime(date_breaks = "3 month", date_labels = "%b %y")
p0

plot of data sample

The closest answer I found was this: Method to extract stat_smooth line fit

Following that suggestion, the first method (using ggplot_build(p1)):

p1<-ggplot(data=dt1, aes(x = Date, y = Measure))+ geom_point()+
  geom_smooth(method = "loess", span=0.4)+
  labs(x="Date",y="Values")+
  scale_x_datetime(date_breaks = "3 month", date_labels = "%b %y")
p1

ggplot_build(p1)
fitdt1<- ggplot_build(p1)$data[[2]]
fitdt1$x<-as.POSIXct(fitdt1$x,origin = "1970-01-01", tz = "GMT")

p2<-ggplot(data=fitdt1, aes(x = x, y = y))+
  geom_point()
p2

...gives only 80 points, which is not precise enough:

Plot of ggplot_build fit

Therefore I was able to create the model manually and decide how many points produce (even with some warnings errors with some data sets):

dt1$Date<-as.numeric(dt1$Date)

modelSlope <- loess(Measure~Date, data= dt1,span=0.4)


xrangeSlope <- range(dt1$Date)
xseqSlope <- seq(from=xrangeSlope[1], to=xrangeSlope[2], length=100000)
predSlope <- predict(modelSlope, newdata = data.frame(Date = xseqSlope), se=TRUE)
ySlope = predSlope$fit
gam.DFslope <- data.frame(x = xseqSlope, ySlope)


gam.DFslope$x<-as.POSIXct(gam.DFslope$x,origin = "1970-01-01", tz = "GMT")
dt1$Date<-as.POSIXct(dt1$Date,origin = "1970-01-01", tz = "GMT")


p3<-ggplot()+
  geom_point(data=gam.DFslope, aes(x = x, y = ySlope),color="green")+
  geom_point(data=dt1, aes(x = Date, y = Measure),color="black")
  
p3

Plot of manually created smooth model

However I'd like to have the same, but the with values of the linear models between the points (as you can see the loess model doesn't fit much). Also the loess model is trowing errors and seem not to work on some different data sets (too small sample?)

Any suggestion how to do that? Is there a way the ggplot_build(p1) of the geom_line could be used? Thanks for any help you can give!


Solution

  • You are looking for the linear interpolation between your known points. R has the built in function approx().

    p0<-ggplot(data=dt1, aes(x = Date, y = Measure))+ geom_point() +geom_line()+
       labs(x="Date",y="Values")+
       scale_x_datetime(date_breaks = "3 month", date_labels = "%b %y")
    p0
    
    #linear interpolate 100 points between min(x) and max(x)
    # use the 'xout' option to specify the locations of interpolation.
    linearinter<-as.data.frame(approx(dt1$Date, dt1$Measure, n=100))
    linearinter$x <- as.POSIXct(linearinter$x, origin = "1970-01-01", tz="GMT")
    
    head(linearinter)
    >                   x        y
    >1 2019-02-13 11:11:00 728.2136
    >2 2019-02-19 02:41:54 728.6076
    >3 2019-02-24 18:12:49 729.0015
    >4 2019-03-02 09:43:43 729.3955
    >5 2019-03-08 01:14:38 729.7894
    >6 2019-03-13 16:45:32 730.1834
    
    p0 + geom_line(aes(x, y), data=linearinter, col="red")
    

    enter image description here

    There is also the spline() function to add some curvature to the interpolation.