Search code examples
rplotlinenls

Plotting exponential function returns excess lines


I am trying to fit a non-linear regression to a set of data. However, when ploted, R returns many different lines where there should only be one.

This problem is only reproducable in one set of data and I can't see any obvious difference between this data and others.

This is the code for my plot:

plot(df$logFC, df$log_pval, 
  xlim=c(0,11.1), ylim=c(0,11),
  xlab = "logFC", ylab = "p_val")

c <- df$logFC
d <- df$log_pval

model = nls(d ~ a*exp(b*c), start = list(a = 2,b = 0.1))

lines(c, predict(model), col = "dodgerblue", lty = 2, lwd = 2)

And here is a sample of my data (df):

logFC   log_pval
4.315   2.788
6.724   9.836
2.925   4.136
5.451   10.836
2.345   1.486
4.219   7.618

I have narrowed the problem down to the model, but I'm not sure where to go from there. Any help is greatly appreciated!


Solution

  • 1) ggplot method

    I tried graphing the data using ggplot2 and I think the output is more what you were expecting...

    library(tibble)
    library(ggplot2)
    library(dplyr)
    
    # Create dataset
    df <- tibble::tribble(~logFC, ~log_pval,
                          4.315,   2.788,
                          6.724,   9.836,
                          2.925,   4.136,
                          5.451,   10.836,
                          2.345,   1.486,
                          4.219,   7.618)
    
    
    # Extract some vectors
    c <- df$logFC
    d <- df$log_pval
    
    # Your model
    model <-  nls(d ~ a*exp(b*c), start = list(a = 2,b = 0.1))
    
    # Create second dataset for new plotting
    df2 <- tibble(logFC = c, log_pval =predict(model))
    
    # Plot output
    ggplot() + 
      geom_line(data = df2, aes(x = logFC, y = log_pval)) + 
      geom_point(data = df, aes(x =logFC, y =log_pval)) +
      theme_classic()
    

    <code>ggplot</code> output

    2) base method

    If you want to stick to base try ordering the x variables in the data frame before plotting the lines:

    plot(df$logFC, df$log_pval, 
         xlab = "logFC", ylab = "p_val")
    
    df3 <- tibble(x = df$logFC, y = predict(model)) %>% dplyr::arrange(x)
    lines(df3$x, df3$y, col = "dodgerblue", lty = 1, lwd = 1)
    

    baseplot