I need a guide about how to do multiple regression columnwise. I have a data frame where I want to get each columns fitting coefficients separately. So far I can have those results for only one column.
What I tried so far
Maybe assigning the result to a new variable
(model.out1 <- lm(y1~x1)) (model.out2 <- lm(y2~x2))
may work but I don`t want to write several fitting equations lets say about 15 and column names each time. that is not the elegant solution.
2. using `apply` function
aa <- apply(df[4:8],2,fit_function)
fit_function <- function(x){nlsLM(x~ifelse(df$direc=="North"&V<J1, exp((-t_pw)/f0*exp(-del1*(1-V/J1)^2)),1)*ifelse(df$direc=="South"&V>J2, exp((-t_pw)/f0*exp(-del2*(1-V/J2)^2)),1)
,data=df,start=c(del1=5,J1=15,del2=1,J2=-5),trace=T)}
giving an error as usually we know
Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates
Maybe separating those columns and fit each one of them and combine fitting coefficients may work. But I dont have an idea how to do.
here is the reproducible data for your check the validity
df
direc <- rep(rep(c("North","South"),each=10),times=6)
V <- rep(c(seq(2,40,length.out=10),seq(-2,-40,length.out=10)),times=1)
DQ0 = c(replicate(2, sort(runif(10,0.001,1))))
DQ1 = c(replicate(2, sort(runif(10,0.001,1))))
DQ2 = c(replicate(2, sort(runif(10,0.001,1))))
DQ3 = c(replicate(2, sort(runif(10,0.001,1))))
DQ4 = c(replicate(2, sort(runif(10,0.001,1))))
group = c(replicate(1,rep(letters[1:6],each=20)))
df <- data.frame(group,direc,V,DQ0,DQ1,DQ2,DQ3,DQ4)
library(minpack.pl)
Since I want to do fitting for all columns DQ0,DQ1,DQ2,DQ3,DQ4 I wrote down this function.
fitting function
f0<-1e-9
t_pw<-3e-8
nls_fit=nlsLM(DQ0~ifelse(df$direc=="North"&V<J1, exp((-t_pw)/f0*exp(-del1*(1-V/J1)^2)),1)*ifelse(df$direc=="South"&V>J2, exp((-t_pw)/f0*exp(-del2*(1-V/J2)^2)),1)
,data=df,start=c(del1=5,J1=15,del2=1,J2=-5),trace=T)
and to get fitting results inside of each group.
df_new<- df%>%
group_by(group)%>%
do(data.frame(model=tidy(nls_fit)))%>%
select_("delta"="model.term","value"= "model.estimate")
how I can get the fitting results for DQ1,DQ2,DQ3 and DQ4 as table. maybe something like this is preferable
group delta value_DQ0 value_DQ1 value_DQ2 value_DQ3 value_DQ4
1 a del1 4.962564 * * * *
2 a J1 14.666667 * * * *
3 a del2 3.496986 * * * *
4 a J2 -14.468551
5 b del1 4.962564
6 b J1 14.666667
7 b del2 3.496986
8 b J2 -14.468551
9 c del1 4.962564
10 c J1 14.666667
.. ... ... ...
edit I found this Help with lm and multiple linear regression maybe I can do this by this
dat <- data.frame(x=1:10,y=rnorm(10),z=10:1)
lm(x~., data=dat)
but when I replace if else part with DQ0 like above I got this error
Probably I`m missing some part. Can you give some clear answer about this_? nay help will be appreciated.
First, I have serous doubts about your approach. As you probably know, non-linear regression is an iterative process for which success depends heavily on the choice of the starting estimates. Not only that, but you must consider the possibility of local minima, and of course you need to assess goodness-of-fit, for instance by looking at the p-values for the parameters and testing the residuals for normality. Your model function is fairly complex, so trying to automate a process like that is unlikely to yield results at all, and even if it does you have no guarantee the results will be meaningful. At the very least you would need to plot the data vs. the model function for all cases. Just generating a table like this is asking for trouble.
Second, your example has several problems. Your model function depends on t_pw
and f0
, which, AFAICT you do not define anywhere, and nlsLM(...)
is in package minpack.lm
, not minpack.pl
(I have not been able to find the latter anywhere).
Having said all that, I can see that you've put a lot of effort into formulating this question, and the basic issue: how to run non-linear regression against an arbitrary list of responses, with the dataset split group-wise, is interesting. Here is one way to do it using the mtcars
data set. In this example, the grouping variable is cyl
(number of cylinders), the response variables are mpg
, qsec
and hp
, and the (very simple) model function is: y ~ a * wt / (b + wt)
, with parameters a
and b
. So, for each cylinder category (4, 6, and 8) we model each of mpg
, qsec
and hp
as a function of wt
and determine a
and b
.
df <- mtcars # safer to make a copy
resp <- c("mpg","qsec","hp") # response variable names
library(minpack.lm) # for nlsLM(...)
get.coefs <- function(y,df) {
fit <- nlsLM(y~a*wt/(b+wt), data=data.frame(y=y,df), start=c(a=1,b=-1))
coef(fit)
}
coefs <- lapply(split(df,df$cyl),function(df) {do.call(cbind,lapply(df[resp],get.coefs,df))})
result <- do.call(rbind,lapply(names(coefs),function(x) {
data.frame(group=x, var=rownames(coefs[[x]]), coefs[[x]])
}))
result
# group var mpg qsec hp
# a 4 a 18.2436308 24.517564 98.80184109
# b 4 b -0.6655570 0.615073 0.42670565
# a1 6 a 14.2066098 62.179060 83.26572253
# b1 6 b -0.8599662 7.639224 -0.97768640
# a2 8 a 9.2212533 21.977518 204.59139171
# b2 8 b -1.4931033 1.213256 -0.08582505
In the code above, the function get.coefs(...)
takes a vector y
containing the response variable and a data.frame df
containing the dataset, runs the regression and returns a vector of the coefficients.
The line coefs <- ...
does most of the work. The inner lapply(...)
passes each column of responses to get.coefs(...)
in turn, and returns the result as a list. do.call(cbind,...)
assembles the list elements into a matrix of coefficients with the coefficient in rows and the response variable in columns. The outer lapply(...)
splits the original data.frame by group (cylinder in this case), and submits each grouped subset to the process described above. The result of all this, coefs
is a list of matrices, one for each group.
The last line: result <- ...
just reformats the coefs
list into the table you wanted.