Search code examples
rfor-loopggplot2nls

Extract model parameters for nls model using for loop and graph results


I have a group of 10 subjects (using a subset of 3 of them below as a sample dataset). I have managed to create a for loop that runs through the 3 subjects and fits each one to a sigmoidal curve using nlsLM (package minpack.lm). nlsLM is the same as nls, it just uses a Levenberg-Marquardt algorithm instead of the Gauss-Newton algorithm used with the base nls function.

I have also manage to extract the model parameters (plateau, S50, slope) and put them in a dataframe. My code is as follows:

dfRC <- read_csv("data/sampleRCdf.csv") #read in the raw data files

# list of subjects of interest to loop through
sub <- unique(dfRC$subID)

# define start to obtain the number and name of fit parameters
start <- list(plateau=1, S50=1, slope=1)

# create empty data.frame to store IDs and parameters
params.pre <- data.frame(matrix(nrow = length(sub), ncol = 1+length(start)))
names(params.pre) <- c("sub", names(start))

# nested for loop that goes by subject (i)
for(i in seq_along(sub)) {
  # create data frame for sub "i"
    individual_DFs <- dfRC %>% filter (subID %in% sub[i])
  
  # fit model for each sub "i"
    fitpre.loop <- nlsLM(mepAMP_pre ~ plateau / (1 + exp(slope*(S50 - state))), 
                         data = individual_DFs,
                         start = start, trace = TRUE)
    
  # store IDs
    params.pre[i,1] <- sub[i]
  # store fit parameters
    params.pre[i,2:ncol(params)] <- fitpre.loop$m$getPars()
}

params.pre

params.pre gives:

sub plateau     S50         slope
101 3.579751    6.505194    0.6930363   
202 2.506159    3.538753    0.8300668   
303 1.971020    5.888228    0.4806047   

This is for the following dataset (where 101, 202, and 303 are my sample subjects), where 101.y.pre is mepAMP_pre and x is state in the for loop above:

x = c(1,2,3,4,5,6,7,8,9,10,11)

101.y.pre <- c(0.38117575, 0.11300747, 0.37239499, 0.51321839, 0.56851893, 
              1.73259296, 2.08146847, 2.80090151, 3.04446933, 2.67647473, 3.87695509)

202.y.pre <- c(0.263931535, 0.554056564, 0.903243066, 1.758670072, 1.512232414,
              2.382228869, 2.744255537, 1.943985522, 2.642561877, 2.880719751, 2.139018852)

303.y.pre <- c(0.197647216, 0.095434883, 0.523944806, 0.625025631, 0.92489588, 
              0.898288637, 0.918388724, 1.433502882, 2.127665395, 1.649622992, 1.642610593)

I had a few questions.

  1. I did a sanity check where I ran each subject individually through nlsLM and made sure the output parameters from that matched the output from my for loop. They do! So, it works. However, I'm a little confused about the output of the individual_DFs dataframe. I thought it would show all 3 subjects, but when I open it up it just shows subject 303. Clearly the code works, but I'm confused why only 1 subject is showing up in the dataframe?

  2. You'll notice I named the param output as params.pre and use the mepAMP_pre values for the model - this is because I have both pre- and post-treatment data for each subject. The .csv file has the column headings "subID", "state", "mepAMP_pre", and "mepAMP_post". Is there a way to use a for loop to loop over both the pre and post values and spit out plateau_pre, plateau_post, S50_pre, S50_post, slope_pre, slope_post and pull them all out into a single dataframe like I currently have with params.pre with the different subjects as rows?

  3. Any advice on how I could graph it? I have managed to graph the pre- fit for one subject using the following code:

df101 <- data.frame(x, fit101.y)

p.sample <- ggplot(data = df101, aes(x = x, y = fit101.y)) +
  geom_point() +
  geom_smooth(method = "nls", 
              data = df101,
              formula = fit101.y ~ plateau / (1 + exp(slope*(S50 - x))), start = list( plateau=1, S50=1, slope=1), 
              se = FALSE)
p.sample

enter image description here

I'd like to maybe overlap the curves for each subject pre- and post-per subject in a grid-like pattern of figures. Or, perhaps graph all the pre- curves onto one figure and all post- curves onto another figure. Would appreciate any help with doing either of these!

---------------EDIT---------------

Thank you Allan for your help! Here is my full data frame (output from dput(dfRC))

structure(list(subID = c(101, 101, 101, 101, 101, 101, 101, 101, 
101, 101, 101, 202, 202, 202, 202, 202, 202, 202, 202, 202, 202, 
202, 303, 303, 303, 303, 303, 303, 303, 303, 303, 303, 303), 
    state = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 3, 4, 
    5, 6, 7, 8, 9, 10, 11, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), 
    mepAMP_pre = c(0.38117575, 0.11300747, 0.37239499, 0.51321839, 
    0.56851893, 1.73259296, 2.08146847, 2.80090151, 3.04446933, 
    2.67647473, 3.87695509, 0.263931535, 0.554056564, 0.903243066, 
    1.758670072, 1.512232414, 2.382228869, 2.744255537, 1.943985522, 
    2.642561877, 2.880719751, 2.139018852, 0.197647216, 0.095434883, 
    0.523944806, 0.625025631, 0.92489588, 0.898288637, 0.918388724, 
    1.433502882, 2.127665395, 1.649622992, 1.642610593), mepAMP_post = c(0.126321776, 
    0.566816552, 0.374254417, 0.199486984, 0.510302018, 1.03651474, 
    1.697137046, 2.090100867, 3.448320717, 2.095180146, 2.897606435, 
    0.018846444, 0.041664734, 0.51243325, 0.961881685, 0.998366952, 
    2.082848001, 2.713030559, 3.373811346, 2.839989549, 3.283945894, 
    3.052075374, 0.232427913, 0.895619231, 1.194016429, 1.721528554, 
    2.249776715, 2.756416541, 4.716890788, 4.16244235, 4.757734573, 
    4.965043759, 4.732616496)), row.names = c(NA, -33L), spec = structure(list(
    cols = list(subID = structure(list(), class = c("collector_double", 
    "collector")), state = structure(list(), class = c("collector_double", 
    "collector")), mepAMP_pre = structure(list(), class = c("collector_double", 
    "collector")), mepAMP_post = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x10d8e15d0>, class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

I tried running the code you included and the updated params.pre worked great, however the plot came out weird and I'm not sure where I'm going wrong.

The code:

params.pre <- do.call("rbind", lapply(split(dfRC, dfRC$subID), function(d) {
  mod <- nlsLM(mepAMP_pre ~ plateau / (1 + exp(slope*(S50 - state))), 
               data = d, start = list(plateau = 1, S50 = 1, slope = 1))
  as.data.frame(c(list(sub = d$subID[1]), as.list(coef(mod))))
}))

ggplot(dfRC, aes(state, mepAMP_pre, color = subID)) +
  geom_point() +
  geom_smooth(method = nlsLM, se = FALSE,
              formula = mepAMP_pre ~ plateau / (1 + exp(slope*(S50 - state))),
              method.args = list(start = list(plateau = 1, S50 = 1, slope = 1)))

The figure it outputs: enter image description here


Solution

  • You don't need to chop up dfRC to get the plot working. First pivot into long format:

    library(minpack.lm)
    library(tidyverse)
    
    dfRC_long <- dfRC %>%
      pivot_longer(contains('mep'), names_sep = '_', 
                   names_to = c('.value', 'prepost')) %>%
      mutate(subID = factor(subID), prepost = factor(prepost, c('pre', 'post')))
    

    Now your plotting code is just:

    ggplot(dfRC_long, aes(state, mepAMP, colour = subID)) +
      geom_point() +
      geom_smooth(method = nlsLM, se = FALSE, 
                  formula = y ~ plateau / (1 + exp(slope*(S50 - x))),
                  method.args = list(start = list(plateau = 1, S50 = 1, slope = 1))
                 ) +
      facet_grid(.~prepost)
    

    enter image description here

    Similarly, you can skip an explicit loop to generate your table using some tidyverse functions:

    dfRC_long %>%
      group_by(subID, prepost) %>%
      group_map(.f = ~ nlsLM(mepAMP ~ plateau / (1 + exp(slope*(S50 - state))), 
                             data = .x, 
                             start = list(plateau = 1, S50 = 1, slope = 1)) %>%
                  coef() %>%
                  t() %>%
                  as.data.frame() %>%
                  mutate(pre_or_post = .y$prepost, .before = 1) %>%
                  mutate(subID = .y$subID, .before = 2)) %>%
      bind_rows() %>%
      arrange(pre_or_post, subID)
    #>   pre_or_post subID  plateau      S50     slope
    #> 1         pre   101 3.579751 6.505194 0.6930363
    #> 2         pre   202 2.506159 3.538753 0.8300668
    #> 3         pre   303 1.971020 5.888228 0.4806047
    #> 4        post   101 2.874621 6.538601 0.9221484
    #> 5        post   202 3.225695 5.356826 0.9406343
    #> 6        post   303 5.084059 5.094672 0.6321269
    

    Created on 2023-08-16 with reprex v2.0.2