Search code examples
rggplot2curve-fittingmodel-fitting

Fitting a sigmoidal curve to points with ggplot


I have a simple dataframe for the response measurements from a drug treatment at various doses:

drug <- c("drug_1", "drug_1", "drug_1", "drug_1", "drug_1", 
  "drug_1", "drug_1", "drug_1", "drug_2", "drug_2", "drug_2", 
        "drug_2", "drug_2", "drug_2", "drug_2", "drug_2")

conc <- c(100.00, 33.33, 11.11, 3.70, 1.23, 0.41, 0.14, 
        0.05, 100.00, 33.33, 11.11, 3.70, 1.23, 0.41, 0.14, 0.05)

mean_response <- c(1156, 1833, 1744, 1256, 1244, 1088, 678, 489, 
        2322, 1867, 1333, 944, 567, 356, 200, 177)

std_dev <- c(117, 317, 440, 200, 134, 38, 183, 153, 719,
      218, 185, 117, 166, 167, 88, 50)

df <- data.frame(drug, conc, mean_response, std_dev)

I can plot these point using the following code and get the basic foundation of the visualization that I would like:

p <- ggplot(data=df, aes(y=mean_response, x= conc, color = drug)) +
  geom_pointrange(aes(ymax = (mean_response + std_dev), ymin = (mean_response - std_dev))) +
  scale_x_log10()

p

plot

The next thing I would like to do with these data is add a sigmoidal curve to the plot, that fits the plotted points for each drug. Following that, I would like to calculate the EC50 for this curve. I realize I may not have the entire range of the sigmoidal curve in my data, but I am hoping to get the best estimate I can with what I have. Also, the final point for drug_1 does not follow the expected trend of a sigmoidal curve, but this is actually not unexpected as the solutions that the drug is in can inhibit responses at high concentrations (each drug is in a different solution). I would like to exclude this point from the data.

I am getting stuck at the step of fitting a sigmoidal curve to my data. I have looked over some other solutions to fitting sigmoidal curves to data but none seem to work.

One post that is very close to my problem is this: (sigmoid) curve fitting glm in r

Based on it, I tried:

p + geom_smooth(method = "glm", family = binomial, se = FALSE)

This gives the following error, and seems to default to plotting straight lines:

`geom_smooth()` using formula 'y ~ x'
Warning message:
Ignoring unknown parameters: family 

I have also tried the solution from this link: Fitting a sigmoidal curve to this oxy-Hb data

In this case, I get the following error:

Computation failed in `stat_smooth()`:
Convergence failure: singular convergence (7) 

and no lines are added to the plot.

I have tried looking up both of these errors but cannot seem to find a reason that makes sense with my data.

Any help would be much appreciated!


Solution

  • As I said in a comment, I would only use geom_smooth() for a very easy problem; as soon as I run into trouble I use nls instead.

    My answer is very similar to @Duck's, with the following differences:

    • I show both unweighted and (inverse-variance) weighted fits.
    • In order to get the weighted fits to work, I had to use the nls2 package, which provides a slightly more robust algorithm
    • I use SSlogis() to get automatic (self-starting) initial parameter selection
    • I do all of the prediction outside of ggplot2, then feed it into geom_line()
    p1 <- nls(mean_response~SSlogis(conc,Asym,xmid,scal),data=df,
              subset=(drug=="drug_1" & conc<100)
            ## , weights=1/std_dev^2  ## error in qr.default: NA/NaN/Inf ...
              )
    
    library(nls2)
    p1B <- nls2(mean_response~SSlogis(conc,Asym,xmid,scal),data=df,
                subset=(drug=="drug_1" & conc<100),
                weights=1/std_dev^2)
    
    p2 <- update(p1,subset=(drug=="drug_2"))
    p2B <- update(p1B,subset=(drug=="drug_2"))
    
    pframe0 <- data.frame(conc=10^seq(log10(min(df$conc)),log10(max(df$conc)), length.out=100))
    pp <- rbind(
        data.frame(pframe0,mean_response=predict(p1,pframe0),
                   drug="drug_1",wts=FALSE),
        data.frame(pframe0,mean_response=predict(p2,pframe0),
                   drug="drug_2",wts=FALSE),
        data.frame(pframe0,mean_response=predict(p1B,pframe0),
                   drug="drug_1",wts=TRUE),
        data.frame(pframe0,mean_response=predict(p2B,pframe0),
                   drug="drug_2",wts=TRUE)
    )
    
    library(ggplot2); theme_set(theme_bw())
    (ggplot(df,aes(conc,mean_response,colour=drug)) +
     geom_pointrange(aes(ymin=mean_response-std_dev,
                         ymax=mean_response+std_dev)) +
     scale_x_log10() +
     geom_line(data=pp,aes(linetype=wts),size=2)
    )
    

    enter image description here

    I believe the EC50 is equivalent to the xmid parameter ... note the large differences between weighted and unweighted estimates ...