I am hoping to efficiently combine my regressions using plyr functions. I have data frames with monthly data for multiple years in format yDDDD
(so y2014
, y2013
, etc.)
Right now, I have the below code for one of those dfs, y2014
. I am running the regressions by month, as desired within each year.
modelsm2= by(y2014,y2014$Date,function(x) lm(y~,data=x))
summarym2=lapply(modelsm2,summary)
coefficientsm2=lapply(modelsm2,coef)
coefsm2v2=ldply(modelsm2,coef) #to get the coefficients into an exportable df
I have several things I'd like to do and I would really appreciate your help!
A. Extract the r^2
for each model. I know that for one model, you can do summary(model)$r.squared
to get it, but I have not had luck with my construct.
B. Apply the same methodology in a loop-type structure to get the models to run for all of my data frames (y2013
and backwards)
C. Get the summary into an easily exportable (to Excel) format --> the ldply
function does not work for the summaries.
Thanks again.
A. You need to subset out the r.squared
values from your summaries:
lapply(summarym2,"[[","r.squared")
B. Put all your data into a list, and put another lapply
around it, eg:
lapply(list(y2014,y2013,y2012), function(dat)
by(dat,dat$Date, function(x) lm(y~.,data=x))
)
You will then have a list of lists so for instance to extract the summaries, you would use:
lapply(lmlist,lapply,summary)
C. summary
returns a fairly complex data structure that cannot be coerced into a data.frame. The result you see is a consequence of the print
method for it. You can use capture.output
to get a charactor vector of each line of the output that you may use to write to a file.