Search code examples
rdplyrmodeltidymodels

Subset data into analysis and assessment data to generate and apply grouped prediction models in R


This question is related to the following but I am unable to make it work

Linear Regression model building and prediction by group in R

Fit a model on each group and evaluate it using data from all rows not in this group, R

Essentially I have a dataframe structured as below. This has been generated by analysing my different experimental conditions on a "multiplex" machine, essentially generating intensity readouts for multiple different "cytokines" of interest. As part of this test I used standard concentrations of each cytokine to generate standard curves (generated by fitting a 5 parameter model). In my data these standards are saved in the column "conditions" and are labeled as (standard1, standard2 etc.). I would then like to apply these models to the intensity data for my "experiment" conditions to generate predicted "value" for my experiments.

test <- data.frame(
  condition = as.factor(c("standard1", "standard2", "standard3", "standard4", "standard5", "standard6", "standard1", "standard2", "standard3", "standard4", "standard5", "standard6", "experiment1", "experiment2", "experiment3")),
  cytokine = as.factor(c("IL1", "IL1", "IL1", "IL1", "IL1", "IL1", "CXCL1", "CXCL1", "CXCL1", "CXCL1", "CXCL1", "CXCL1", "IL1", "IL1", "CXCL1")),
  value = as.numeric(c(1000, 500, 200, 100, 50, 25, 1500, 1000, 400, 300, 50, 20, NA, NA, NA)),
intensity = as.numeric(c(1.00, 0.6, 0.3, 0.2, 0.1, 0.05, 0.95, 0.87, 0.50, 0.4, 0.2, 0.1, 0.5, 0.7, 0.1)))

My real data is much longer and has many more conditions and cytokines

I have tried the following and think it works to this point


# Make a function to fit the 5PL model
fit_5pl <- function(data) {
  drc::drm(
     value ~ intensity, 
    data = data, 
    fct = drc::LL.5(names = c("Slope", "Lower", "Upper", "EC50", "Asym"))
  )
}

# generate model for each cytokine
test_fit <- test %>% 
  nest(-cytokine) %>% 
  mutate(fit = map(data, fit_5pl))

I then get stuck trying to apply these models to the experimental data. I have tried creating a loop and also emulating the work in the given examples but just cannot manage it!

Any suggestions, especially using tidyverse or tidymodels would be greatly appreciated! Happy to also include what I have tried but non of it has worked thus far. Many thanks!


Solution

  • Perhaps you are looking for something like this, with a helping hand from broom::augment.

    test |> 
      # make a variable that separates the standards from the experimental data
      mutate(type = ifelse(str_detect(condition, "standard"), "standard", "experiment")) |> 
      nest(.by = c(cytokine, type)) |> 
      # we want 1 row per cytokine
      pivot_wider(names_from = type, values_from = data) |> 
      mutate(
        # fit the standard curve
        fit = map(standard, fit_5pl),
        # then predict the values for the experimental condition
        predicted = map2(fit, experiment, \(m, nd) broom::augment(m, newdata = nd))
      ) |> 
      unnest(predicted) |> 
      select(cytokine, condition, intensity, .fitted)
    
    # A tibble: 3 × 4
      cytokine condition   intensity .fitted
      <fct>    <fct>           <dbl>   <dbl>
    1 IL1      experiment1       0.5   389. 
    2 IL1      experiment2       0.7   621. 
    3 CXCL1    experiment3       0.1    41.7
    

    Or, if you want to predict over all values, it's simpler:

    test |> 
      nest(.by = c(cytokine)) |> 
      mutate(
        # fit the standard curve
        fit = map(data, fit_5pl),
        # then predict the values
        predicted = map2(fit, data, \(m, nd) broom::augment(m, newdata = nd))
      ) |> 
      unnest(predicted) |> 
      select(cytokine, condition, intensity, value, .fitted)
    
    # A tibble: 15 × 5
       cytokine condition   intensity value .fitted
       <fct>    <fct>           <dbl> <dbl>   <dbl>
     1 IL1      standard1        1     1000  1000. 
     2 IL1      standard2        0.6    500   502. 
     3 IL1      standard3        0.3    200   191. 
     4 IL1      standard4        0.2    100   111. 
     5 IL1      standard5        0.1     50    47.3
     6 IL1      standard6        0.05    25    24.3
     7 IL1      experiment1      0.5     NA   389. 
     8 IL1      experiment2      0.7     NA   621. 
     9 CXCL1    standard1        0.95  1500  1389. 
    10 CXCL1    standard2        0.87  1000  1153. 
    11 CXCL1    standard3        0.5    400   369. 
    12 CXCL1    standard4        0.4    300   239. 
    13 CXCL1    standard5        0.2     50    77.7
    14 CXCL1    standard6        0.1     20    41.7
    15 CXCL1    experiment3      0.1     NA    41.7