Search code examples
rdatetimeiterationnon-linear-regression

Multiple iterations of quadratic regression in R


I am analyzing meteorological data that is collected on a long term basis at regular intervals (every 15 to 60 minutes for most of the data). Temperature and other measures of the effect of solar radiation have daily cycles. I am trying to describe average exposure to solar radiation for any given day of the year, if the radiation is not blocked by clouds. I have access to multiple years worth of data, and I can average whatever data I put into R according to the day of the year. Some of the data needs to be thrown out before I do the averages in order to describe the average radiation of a cloudless day.

Apparently I don't have the reputation to post the graphic, but a graph of the radiation pattern of a cloudless day has a parabolic shape. Cloudy days can be identified by a curve with multiple peaks. The R^2 value of a quadratic regression could be used to differentiate the two types of days.

(Edit -- All of the radiation data and dates/times times are reported in two columns in a single text file. I have separated the data below by date to allow any reader to easily view the patterns I am trying to analyze, and because I don't know of a more sophisticated way to share the data and display the patterns.)

# The following vectors contain the dates and times of the readings, and the
# radiation recorded.
DateTime1<-c("13/10/23 07:00", "13/10/23 08:00", "13/10/23 09:00", "13/10/23 10:00", "13/10/23 11:00", "13/10/23 12:00", "13/10/23 13:00", "13/10/23 14:00", "13/10/23 15:00", "13/10/23 16:00", "13/10/23 17:00", "13/10/23 18:00", "13/10/23 19:00")
Sol.Rad1<-c(0, 68.78761823, 214.961307, 369.733448, 498.7102322, 576.0963027, 601.8916595, 541.7024936, 447.1195185, 352.5365434, 189.1659501, 8.598452279, 0)
DateTime2<-c("13/10/24 07:00", "13/10/24 08:00", "13/10/24 09:00", "13/10/24 10:00", "13/10/24 11:00", "13/10/24 12:00", "13/10/24 13:00", "13/10/24 14:00", "13/10/24 15:00", "13/10/24 16:00", "13/10/24 17:00", "13/10/24 18:00", "13/10/24 19:00")
Sol.Rad2<-c(0, 68.78761823, 214.961307, 369.733448, 498.7102322, 309.544282, 576.0963027, 386.9303525, 464.316423, 326.7411866, 167.6698194, 8.598452279, 0)

# The vector "Centered" is used to represent the time of day with the
# potential peak of radiation as the centered zero value.  This vector allows
# for the quadratic regressions.
Centered<-c( -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6)

# Combine the vectors into data frames; one for each day.
day1<-data.frame(DateTime1,Centered,Sol.Rad1)
day2<-data.frame(DateTime2,Centered,Sol.Rad2)

# Plotting day1 shows the parabolic shape of a cloudless day
plot(day1$Sol.Rad1 ~ day1$Centered)

# Plotting day2 shows differences in the curve (two additional peaks) due to
# cloud cover.
plot(day2$Sol.Rad2 ~ day2$Centered)

# The R^2 values from a quadratic regression of day1 are close to 0.93.
qr1<- lm(day1$Sol.Rad ~ poly(day1$Centered, 2, raw=TRUE))
summary(qr1)

# While the R^2 values from day2 are less than 0.86.
qr2<- lm(day2$Sol.Rad ~ poly(day2$Centered, 2, raw=TRUE))
summary(qr2)

The differences in R^2 could be used as a way to distinguish cloudy days from sunny days, if I could find a way to repeat this process for each day within a larger data set.

Is there a way to do multiple quadratic regressions from a single data frame where either the dates and times, or the radiation readings for all days are reported within a single column.

Ideally, I would like to end up with a table with two columns. One column would contain the day of the year, and the second column would contain the R^2 value from the quadratic regression analysis. I think that either the Multiple R^2, or the Adjusted R^2 would work (but I don't know enough of the difference between the two versions of R^2 that I couldn't be persuaded to use one in favor over the other.)

I don't know how to only report the R^2 values from the quadratic regression analysis, or how to repeat the quadratic regressions as many times as days of data I am analyzing. I am potentially looking at 10 years of data, so being able to analyze and report the results of anaylses in a single table would be a fantastic way to sort which days of data I can use.


Solution

  • I'd do as following.

    First of all, it is better to standardize column names.

    Then the two data frames are binded using rbind().

    As it's necessary to loop over dates, its type has been modified using as.Date() - the two dates to loop is shown below.

    Loop is done with lapply() over unique dates and a data frame is created to keep both date and adjusted R squared - it's better to rely on this when there are more than 1 explanatory variables.

    Finally do.call() is used to bind individual outcomes.

    # better to standardize column names
    day1 <- data.frame(date = DateTime1, center = Centered, sol.rad = Sol.Rad1)
    day2 <- data.frame(date = DateTime2, center = Centered, sol.rad = Sol.Rad2)
    
    # bind them in a single data frame
    days <- rbind(day1, day2)
    
    # convert into date
    days$date <- as.Date(days$date, "%y/%m/%d")
    unique(days$date)
    #[1] "2013-10-23" "2013-10-24"
    
    # lapply recursive fit lm() and put date and adjusted R sqared in a data frame
    # do.call bind them
    do.call(rbind, lapply(unique(days$date), function(x) {
      frame <- model.frame(sol.rad ~ poly(Centered, 2, raw = TRUE), data = days[days$date==x,])
      qr <- lm(sol.rad ~ ., data = frame)
      data.frame(date = x, r_sqrd = summary(qr)$adj.r.squared)
    }))
    
    #date    r_sqrd
    #1 2013-10-23 0.9232707
    #2 2013-10-24 0.8293529