Search code examples
rggplot2logistic-regressionchemistrydrc

Plotting a sigmoidal curve to get the Kd value in a biochemical affinity experiment


I have measured the binding affinity of several biomolecules to each other (a ligand and a target). This ideally results in a sigmoidal curve of the data. The inflection point represents the Kd value (dissociation constant), which is an indicator for the binding strength between said molecules.

The x axis is the ligand concentration and the y axis is the observed response, Fnorm or other measure (usually a number between 1 and 1000). This attached image shows what it looks like:

example graph 1

  1. I have no issue plotting my data points in ggplot2 and actually getting a decent graph. However, I also need the actual sigmoidal curve going through the data points (with a 95% CI around it).

  2. In some cases the ligand can't be concentrated any higher, meaning that a part of one of the "plateaus" of the curve is missing. I would like to have a sigmoidal curve that tries to "guess" or estimate what the full data distribution looks like, even if a small part of it is missing. This attached image shows what I mean by that:

example graph 2

Notice how the lower right part of the curve doesn't just end with the data points. But instead it "approximates" or "guesses" the rest of the sigmoidal curve from the last data points.

  1. Therefore, I would also like to have some kind of goodness-of-fit measure I can show, to let people know how well this sigmoidal model fits/describes the data. Not sure if something like the McFaddens´s R2 value would be good, but something like this.

  2. Finally, I of course also need the inflection point as an output (= a value on the x axis).

I have found somewhat similar questions, but they are not exactly what I need and I have failed adapting the solutions to me:

This is the dummy example data of how a triplicate affinity measurement might look like:

ligand-conc (x): 5.289E-09, 1.058E-08, 2.115E-08, 4.231E-08, 8.462E-08, 1.692E-07, 3.385E-07, 6.769E-07, 1.354E-06, 2.708E-06, 5.415E-06, 1.083E-05, 2.166E-05, 4.332E-05, 8.665E-05, 1.733E-04

Fnorm exp.a (y): 792.6444, 792.8537, 788.0273, 793.9693, 792.3848, 792.311, 790.5109, 790.4974, 796.1723, 790.8627, 790.2954, 784.7171, 773.0447, 760.8085, 745.5512, 738.3463

Fnorm exp.b (y): 790.2453, 793.8565, 789.5286, 791.8368, 788.5138, 790.0382, 792.85, 789.1439, 790.3487, 792.1872, 786.6738, 780.0627, 775.8658, 762.8376, 747.4288, 737.8717

Fnorm exp.c (y):  788.2453, 790.5648, 792.8529, 790.1368, 793.5138, 791.7038, 788.85, 791.1439, 789.4487, 788.8872, 789.5674, 783.3063, 774.8658, 764.5838, 749.4288, 736.5872

Here is what it looks like in an excel format: excel file

This is the code I have used thus far:

mydata <- read.csv("example")

names(mydata)[names(mydata) == "ligand.conc"] <- "ligand" #different name of a column for convenience
mydata$ligand <- as.numeric(mydata$ligand)
mydata$ligand <- mydata$ligand*1000 #changing the unit of the concentration from M to mM
mydata$ligand <- mydata$ligand*1000 #changing the unit of the concentration from mM to µM
mydata$Fnorm <- as.numeric(mydata$Fnorm)

base = ggplot(mydata)

base + 
  geom_point(aes(x=concentration, y=Fnorm, color=experiment))+

  geom_smooth(aes(x=ligand, y=Fnorm),
  method = drm, method.args = list(fct = L.4()), se = FALSE)+

  theme_bw() +
  theme(                                                 
    axis.line.x.bottom = element_line(color = 'black'), 
    axis.line.y.left   = element_line(color = 'black'),
    axis.line.y.right  = element_line(color = 'black'),
    panel.grid.minor.x = element_blank(),
    panel.border       = element_blank(),
    axis.title.x = element_markdown(),
    axis.title.y = element_markdown(),
    axis.minor.ticks.length = rel(1),
    axis.text = element_text(color = "black",
                             size = 10),
    axis.ticks=element_line(linewidth=0.6),
    axis.ticks.length = unit(2.75, "pt"),
    ) +

  scale_color_manual(
    name="Replicate",
    labels = c("1", "2", "3"), 
    values = c("sienna1", "dodgerblue","grey43")) +
  
  coord_cartesian(ylim = c(720, 800), expand = TRUE)+
  
  scale_x_continuous(trans="log10",
                    expand = c(0, 0),
                    label = label_number(),
                    breaks = c(0.01, 0.1, 1, 10, 100, 1000),
                    guide = guide_axis_logticks(long = 2.3, mid = 1.65, short = 0.75),
                    limits = c(0.001,1000))+
  
  labs(
    y="Fnorm [%<sub>280</sub>]",
    x="Ligand conc. [µM]"
  )+

print(base)

This yields the following plot:

my graph

The curve appears very rough/jagged and not very smooth. As described above, the curve is incomplete as it would have to guess what the lower right part looks like. It is also missing the 95% CI. Further, I am not sure how to get the inflection point and goodness-of-fit outputs.

As of yet I have used the drc package (as described here), but this calculates a so-called ED50 value; I am unsure whether this equals the inflection point and thus Kd.

In order to get the 95% CI around the curve, I have tried to replace

geom_smooth(aes(x=ligand, y=Fnorm),
  method = drm, method.args = list(fct = L.4()), se = FALSE)+

with

geom_smooth(aes(x=ligand, y=Fnorm),
  method = drm, method.args = list(fct = L.4()), level=0.95)+

however, this produces this error:

geom_smooth() using formula = 'y ~ x' Failed to fit group -1. Caused by error in pred$fit: $ operator is invalid for atomic vectors

I am also unsure why I have to specify aes (x, y) twice in a row in the same ggplot. Adding geom_smooth() to the plot, without the aes() part, returns the error:

stat_smooth() requires the following missing aesthetics: x and y.

Adding geom_smooth(aes(x,y)) even just a little further down in the code produces:

Failed to fit group -1. Caused by error in method(): Convergence failure: singular convergence (7)"


Solution

  • Here's what I have so far. While you can use fitting methods like drc::drm() within ggplot, it's usually easier (in my experience) to apply them externally, then feed the results into ggplot.

    set up data

    lvec <- c(5.289E-09, 1.058e-08, 2.115e-08, 4.231e-08, 8.462e-08, 1.692e-07,
              3.385e-07, 6.769e-07, 1.354e-06, 2.708e-06, 5.415e-06, 1.083e-05,
              2.166e-05, 4.332e-05, 8.665e-05, 1.733e-04)
    expa <- c(792.6444, 792.8537, 788.0273, 793.9693, 792.3848, 792.311, 790.5109, 790.4974, 796.1723, 790.8627, 790.2954, 784.7171, 773.0447, 760.8085, 745.5512, 738.3463)
    expb <- c(790.2453, 793.8565, 789.5286, 791.8368, 788.5138, 790.0382, 792.85, 789.1439, 790.3487, 792.1872, 786.6738, 780.0627, 775.8658, 762.8376, 747.4288, 737.8717)
    expc <- c(788.2453, 790.5648, 792.8529, 790.1368, 793.5138, 791.7038, 788.85, 791.1439, 789.4487, 788.8872, 789.5674, 783.3063, 774.8658, 764.5838, 749.4288, 736.5872)
    dd <- data.frame(conc=lvec*1000,
                     Fnorm = c(expa, expb, expc),
                     experiment = rep(c("a","b","c"), each = 16))
    

    fit model and generate prediction with std errors

    library(drc)
    
    fit1 <- drm(Fnorm ~ conc, data = dd, fct = LL.4())
    ## explicitly ask for predictions over a wider range than the data:
    ## these values specify the range of the predictions
    minval <- 4e-6; maxval <- 10
    pframe <- data.frame(
        conc = exp(seq(log(minval), log(maxval), length.out = 101)))
    pframe <- cbind(pframe,
                    predict(fit1, newdata = pframe, se.fit = TRUE))
    names(pframe)[2:3] <- c("Fnorm", "Fnorm_se")
    

    plot

    library(ggplot2)
    ggplot(dd, aes(conc, Fnorm)) +
        geom_point(aes(colour = experiment)) +
        geom_line(data = pframe, colour = "black") +
        geom_ribbon(data = pframe, 
                    colour = NA,     ## no line at edges of ribbon
                    fill = "black",  ## colour of ribbon
                    alpha = 0.1,     ## make ribbon semi-transparent
                    aes(ymin = Fnorm - 2*Fnorm_se,
                        ymax = Fnorm + 2*Fnorm_se)) +
        scale_x_log10() +
        geom_hline(yintercept = coef(fit1)[2], lty = 2) +
        geom_vline(xintercept = coef(fit1)[4], lty = 2)
    

    (modified from copilot): A graph with a scatter plot and a fitted curve. The x-axis shows concentration and is on a logarithmic scale, ranging from 1e-04 to 1e+00. The y-axis shows the binding affinity and ranges approximately from 720 to 800. There are three sets of data points from different experiments (a, b, c) represented by different colors. These data points are scattered across the graph but generally follow a descending trend from left to right. A black decreasing sigmoidal curve goes through the data points, with a shaded ribbon showing ± 2SE. Dashed lines show the inflection point and the lower bound.

    The inflection point is coef(fit1)[4] = 0.0494.

    Note these are ± 2 SE confidence intervals but you're welcome to use ± 1.96 SE (or some appropriate t-distribution-based interval) instead.

    Not sure offhand about the goodness-of-fit statistic, although summary() does give you the residual standard error (from which you should be able to compute an R^2 value ...) Or compute cor(predict(fit1), dd$conc)^2 = 0.902.