Search code examples
rapplylapplynlstapply

How to Use nlsLM function together with one of the apply family function in R


I need a guide about how to do multiple regression columnwise. I have a data frame where I want to get each columns fitting coefficients separately. So far I can have those results for only one column.

What I tried so far

  1. Maybe assigning the result to a new variable

    (model.out1 <- lm(y1~x1)) (model.out2 <- lm(y2~x2))

may work but I don`t want to write several fitting equations lets say about 15 and column names each time. that is not the elegant solution.

 2. using `apply` function 

aa <- apply(df[4:8],2,fit_function)


    fit_function <- function(x){nlsLM(x~ifelse(df$direc=="North"&V<J1, exp((-t_pw)/f0*exp(-del1*(1-V/J1)^2)),1)*ifelse(df$direc=="South"&V>J2, exp((-t_pw)/f0*exp(-del2*(1-V/J2)^2)),1)
    ,data=df,start=c(del1=5,J1=15,del2=1,J2=-5),trace=T)}

giving an error as usually we know

Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates

Maybe separating those columns and fit each one of them and combine fitting coefficients may work. But I dont have an idea how to do.

here is the reproducible data for your check the validity
df

direc <- rep(rep(c("North","South"),each=10),times=6)
  V <- rep(c(seq(2,40,length.out=10),seq(-2,-40,length.out=10)),times=1)
  DQ0 = c(replicate(2, sort(runif(10,0.001,1))))
  DQ1 = c(replicate(2, sort(runif(10,0.001,1))))
  DQ2 = c(replicate(2, sort(runif(10,0.001,1))))
  DQ3 = c(replicate(2, sort(runif(10,0.001,1))))
  DQ4 = c(replicate(2, sort(runif(10,0.001,1))))
  group  =  c(replicate(1,rep(letters[1:6],each=20)))  
df <- data.frame(group,direc,V,DQ0,DQ1,DQ2,DQ3,DQ4)

library(minpack.pl)

Since I want to do fitting for all columns DQ0,DQ1,DQ2,DQ3,DQ4 I wrote down this function.

fitting function

 f0<-1e-9
 t_pw<-3e-8

nls_fit=nlsLM(DQ0~ifelse(df$direc=="North"&V<J1, exp((-t_pw)/f0*exp(-del1*(1-V/J1)^2)),1)*ifelse(df$direc=="South"&V>J2, exp((-t_pw)/f0*exp(-del2*(1-V/J2)^2)),1)
        ,data=df,start=c(del1=5,J1=15,del2=1,J2=-5),trace=T)

and to get fitting results inside of each group.

df_new<- df%>%
  group_by(group)%>%
  do(data.frame(model=tidy(nls_fit)))%>%
  select_("delta"="model.term","value"= "model.estimate")

how I can get the fitting results for DQ1,DQ2,DQ3 and DQ4 as table. maybe something like this is preferable

    group delta  value_DQ0  value_DQ1   value_DQ2  value_DQ3 value_DQ4 
1      a  del1   4.962564       *           *          *         * 
2      a    J1  14.666667       *           *          *         *
3      a  del2   3.496986       *           *          *         *
4      a    J2 -14.468551
5      b  del1   4.962564
6      b    J1  14.666667
7      b  del2   3.496986
8      b    J2 -14.468551
9      c  del1   4.962564
10     c    J1  14.666667
..   ...   ...        ...

edit I found this Help with lm and multiple linear regression maybe I can do this by this

dat <- data.frame(x=1:10,y=rnorm(10),z=10:1)
lm(x~., data=dat)

but when I replace if else part with DQ0 like above I got this error

Probably I`m missing some part. Can you give some clear answer about this_? nay help will be appreciated.


Solution

  • First, I have serous doubts about your approach. As you probably know, non-linear regression is an iterative process for which success depends heavily on the choice of the starting estimates. Not only that, but you must consider the possibility of local minima, and of course you need to assess goodness-of-fit, for instance by looking at the p-values for the parameters and testing the residuals for normality. Your model function is fairly complex, so trying to automate a process like that is unlikely to yield results at all, and even if it does you have no guarantee the results will be meaningful. At the very least you would need to plot the data vs. the model function for all cases. Just generating a table like this is asking for trouble.

    Second, your example has several problems. Your model function depends on t_pw and f0, which, AFAICT you do not define anywhere, and nlsLM(...) is in package minpack.lm, not minpack.pl (I have not been able to find the latter anywhere).

    Having said all that, I can see that you've put a lot of effort into formulating this question, and the basic issue: how to run non-linear regression against an arbitrary list of responses, with the dataset split group-wise, is interesting. Here is one way to do it using the mtcars data set. In this example, the grouping variable is cyl (number of cylinders), the response variables are mpg, qsec and hp, and the (very simple) model function is: y ~ a * wt / (b + wt), with parameters a and b. So, for each cylinder category (4, 6, and 8) we model each of mpg, qsec and hp as a function of wt and determine a and b.

    df   <- mtcars                # safer to make a copy
    resp <- c("mpg","qsec","hp")  # response variable names
    library(minpack.lm)           # for nlsLM(...)
    get.coefs <- function(y,df) {
      fit <- nlsLM(y~a*wt/(b+wt), data=data.frame(y=y,df), start=c(a=1,b=-1))
      coef(fit)
    }
    coefs  <- lapply(split(df,df$cyl),function(df) {do.call(cbind,lapply(df[resp],get.coefs,df))})
    result <- do.call(rbind,lapply(names(coefs),function(x) {
      data.frame(group=x, var=rownames(coefs[[x]]), coefs[[x]])
    }))
    result
    #    group var        mpg      qsec           hp
    # a      4   a 18.2436308 24.517564  98.80184109
    # b      4   b -0.6655570  0.615073   0.42670565
    # a1     6   a 14.2066098 62.179060  83.26572253
    # b1     6   b -0.8599662  7.639224  -0.97768640
    # a2     8   a  9.2212533 21.977518 204.59139171
    # b2     8   b -1.4931033  1.213256  -0.08582505
    

    In the code above, the function get.coefs(...) takes a vector y containing the response variable and a data.frame df containing the dataset, runs the regression and returns a vector of the coefficients.

    The line coefs <- ... does most of the work. The inner lapply(...) passes each column of responses to get.coefs(...) in turn, and returns the result as a list. do.call(cbind,...) assembles the list elements into a matrix of coefficients with the coefficient in rows and the response variable in columns. The outer lapply(...) splits the original data.frame by group (cylinder in this case), and submits each grouped subset to the process described above. The result of all this, coefs is a list of matrices, one for each group.

    The last line: result <- ... just reformats the coefs list into the table you wanted.