I am analyzing meteorological data that is collected on a long term basis at regular intervals (every 15 to 60 minutes for most of the data). Temperature and other measures of the effect of solar radiation have daily cycles. I am trying to describe average exposure to solar radiation for any given day of the year, if the radiation is not blocked by clouds. I have access to multiple years worth of data, and I can average whatever data I put into R according to the day of the year. Some of the data needs to be thrown out before I do the averages in order to describe the average radiation of a cloudless day.
Apparently I don't have the reputation to post the graphic, but a graph of the radiation pattern of a cloudless day has a parabolic shape. Cloudy days can be identified by a curve with multiple peaks. The R^2 value of a quadratic regression could be used to differentiate the two types of days.
(Edit -- All of the radiation data and dates/times times are reported in two columns in a single text file. I have separated the data below by date to allow any reader to easily view the patterns I am trying to analyze, and because I don't know of a more sophisticated way to share the data and display the patterns.)
# The following vectors contain the dates and times of the readings, and the
# radiation recorded.
DateTime1<-c("13/10/23 07:00", "13/10/23 08:00", "13/10/23 09:00", "13/10/23 10:00", "13/10/23 11:00", "13/10/23 12:00", "13/10/23 13:00", "13/10/23 14:00", "13/10/23 15:00", "13/10/23 16:00", "13/10/23 17:00", "13/10/23 18:00", "13/10/23 19:00")
Sol.Rad1<-c(0, 68.78761823, 214.961307, 369.733448, 498.7102322, 576.0963027, 601.8916595, 541.7024936, 447.1195185, 352.5365434, 189.1659501, 8.598452279, 0)
DateTime2<-c("13/10/24 07:00", "13/10/24 08:00", "13/10/24 09:00", "13/10/24 10:00", "13/10/24 11:00", "13/10/24 12:00", "13/10/24 13:00", "13/10/24 14:00", "13/10/24 15:00", "13/10/24 16:00", "13/10/24 17:00", "13/10/24 18:00", "13/10/24 19:00")
Sol.Rad2<-c(0, 68.78761823, 214.961307, 369.733448, 498.7102322, 309.544282, 576.0963027, 386.9303525, 464.316423, 326.7411866, 167.6698194, 8.598452279, 0)
# The vector "Centered" is used to represent the time of day with the
# potential peak of radiation as the centered zero value. This vector allows
# for the quadratic regressions.
Centered<-c( -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6)
# Combine the vectors into data frames; one for each day.
day1<-data.frame(DateTime1,Centered,Sol.Rad1)
day2<-data.frame(DateTime2,Centered,Sol.Rad2)
# Plotting day1 shows the parabolic shape of a cloudless day
plot(day1$Sol.Rad1 ~ day1$Centered)
# Plotting day2 shows differences in the curve (two additional peaks) due to
# cloud cover.
plot(day2$Sol.Rad2 ~ day2$Centered)
# The R^2 values from a quadratic regression of day1 are close to 0.93.
qr1<- lm(day1$Sol.Rad ~ poly(day1$Centered, 2, raw=TRUE))
summary(qr1)
# While the R^2 values from day2 are less than 0.86.
qr2<- lm(day2$Sol.Rad ~ poly(day2$Centered, 2, raw=TRUE))
summary(qr2)
The differences in R^2 could be used as a way to distinguish cloudy days from sunny days, if I could find a way to repeat this process for each day within a larger data set.
Is there a way to do multiple quadratic regressions from a single data frame where either the dates and times, or the radiation readings for all days are reported within a single column.
Ideally, I would like to end up with a table with two columns. One column would contain the day of the year, and the second column would contain the R^2 value from the quadratic regression analysis. I think that either the Multiple R^2, or the Adjusted R^2 would work (but I don't know enough of the difference between the two versions of R^2 that I couldn't be persuaded to use one in favor over the other.)
I don't know how to only report the R^2 values from the quadratic regression analysis, or how to repeat the quadratic regressions as many times as days of data I am analyzing. I am potentially looking at 10 years of data, so being able to analyze and report the results of anaylses in a single table would be a fantastic way to sort which days of data I can use.
I'd do as following.
First of all, it is better to standardize column names.
Then the two data frames are binded using rbind()
.
As it's necessary to loop over dates, its type has been modified using as.Date()
- the two dates to loop is shown below.
Loop is done with lapply()
over unique dates and a data frame is created to keep both date
and adjusted R squared
- it's better to rely on this when there are more than 1 explanatory variables.
Finally do.call()
is used to bind individual outcomes.
# better to standardize column names
day1 <- data.frame(date = DateTime1, center = Centered, sol.rad = Sol.Rad1)
day2 <- data.frame(date = DateTime2, center = Centered, sol.rad = Sol.Rad2)
# bind them in a single data frame
days <- rbind(day1, day2)
# convert into date
days$date <- as.Date(days$date, "%y/%m/%d")
unique(days$date)
#[1] "2013-10-23" "2013-10-24"
# lapply recursive fit lm() and put date and adjusted R sqared in a data frame
# do.call bind them
do.call(rbind, lapply(unique(days$date), function(x) {
frame <- model.frame(sol.rad ~ poly(Centered, 2, raw = TRUE), data = days[days$date==x,])
qr <- lm(sol.rad ~ ., data = frame)
data.frame(date = x, r_sqrd = summary(qr)$adj.r.squared)
}))
#date r_sqrd
#1 2013-10-23 0.9232707
#2 2013-10-24 0.8293529