Search code examples
rggplot2curve-fittingdata-fittingmodel-fitting

Why is there a difference between do(lm...) and geom_smooth(method="lm")?


I have an external calibration curve that slightly goes into saturation. So I fit a polynomial of second order, and a dataframe of measured samples, of which I'd like to know the concentration.

df_calibration=structure(list(dilution = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 
0.8, 0.9, 1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1), 
    area = c(1000, 2000, 3000, 4000, 5000, 6000, 7000, 7800, 
    8200, 8500, 1200, 2200, 3200, 4200, 5200, 6200, 7200, 8000, 
    8400, 8700), substance = c("A", "A", "A", "A", "A", "A", 
    "A", "A", "A", "A", "b", "b", "b", "b", "b", "b", "b", "b", 
    "b", "b")), row.names = c(NA, 20L), class = "data.frame")

df_samples=structure(list(area = c(1100, 1800, 2500, 3200, 3900, 1300, 2000, 
2700, 3400, 4100), substance = c("A", "A", "A", "A", "A", "b", 
"b", "b", "b", "b")), row.names = c(NA, 10L), class = "data.frame")

To calculate now the actual dilutions from measured samples, I take the parameters generated from this fit:

df_fits=df_calibration %>% group_by(substance) %>% 
  do(fit = lm(area ~ poly(dilution,2), data = .))%>%
  tidy(fit) %>% 
  select(substance, term, estimate) %>% 
  spread(term, estimate)

df_fits=df_fits %>% rename(a=`poly(dilution, 2)2`,b=`poly(dilution, 2)1`,c=`(Intercept)`)

#join parameters with sample data
df_samples=left_join(df_samples,df_fits)

and this formula formula to calculate

#calculate with general solution for polynomial 2nd order
df_samples$dilution_calc=
  (df_samples$b*(-1)+sqrt(df_samples$b^2-(4*df_samples$a*(df_samples$c-df_samples$area))))/(2*df_samples$a) 

However, when I plot this now, I notice something very odd. The calculated x-values (dilutions) do not end up on the curve from stat_smooth(). The additional dotted line is put with the parameters from the equation in the graph (that match the numbers in the data frame) for substance "A". So my calculations should be correct (or not?) Why is there a difference? What am I doing wrong? How could I get parameters from the fit done by stat_smooth()?

my.formula=y ~ poly(x,2)
ggplot(df_calibration, aes(x = dilution, y = area)) +
  stat_smooth(method = "lm", se=FALSE, formula = my.formula) +

  stat_function(fun=function(x){5250+(7980*x)+(-905*x^2)},      
              inherit.aes = F,linetype="dotted")+

  stat_poly_eq(formula = my.formula, 
               aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), 
               parse = TRUE) +         
  geom_point(shape=17)+
  geom_point(data=df_samples,
           aes(x=dilution_calc,y=area),
           shape=1,color="red")+
  facet_wrap(~substance,scales = "free")

plot with odd behaviour

Any suggestion will be highly appreciated :-)


Solution

  • By default, poly computes orthogonal polynomials. You can turn orthogonalization off with the raw=TRUE argument.

    Note that the formula makes two appearances: once with the original variable names in fitting the regressions and then in stat_smooth using the generic variable names x and y. But otherwise it should be the same formula, with raw=TRUE.

    library("tidyverse")
    
    # Define/import your data here....
    
    df_fits <- df_calibration %>%
      group_by(substance) %>%
      do(fit = lm(area ~ poly(dilution, 2, raw = TRUE), data = .)) %>%
      broom::tidy(fit) %>%
      select(substance, term, estimate) %>%
      spread(term, estimate) %>%
      # It is simpler to rename the coefficients here
      setNames(c("substance", "c", "b", "a"))
    
    # join parameters with sample data
    df_samples <- left_join(df_samples, df_fits)
    
    # calculate with general solution for polynomial 2nd order
    df_samples <- df_samples %>%
      mutate(dilution_calc = (b * (-1) + sqrt(b^2 - (4 * a * (c - area)))) / (2 * a))
    
    my.formula <- y ~ poly(x, 2, raw = TRUE)
    
    df_calibration %>%
      ggplot(aes(x = dilution, y = area)) +
      stat_smooth(method = "lm", se = FALSE, formula = my.formula) +
      geom_point(shape = 17) +
      geom_point(
        data = df_samples,
        aes(x = dilution_calc, y = area),
        shape = 1, color = "red"
      ) +
      facet_wrap(~substance, scales = "free")
    

    Created on 2019-03-31 by the reprex package (v0.2.1)