Search code examples
rplotregressionaverage

Plotting averaged values against baseline, run linear regression


Some data to get started:

ID <- c("A","A","A","A","B","B","B","B","C","C","C","C","C")
SampleSection <- c("Base", "First", "Second","Second","Base","First","First","Second","Base","First","First","Second","Second")
lnCort <- c(7.26, 7.68, 7.73, 7.80, 7.95, 7.16, 6.88, 7.81, 7.75, 7.75, 7.40, 8.43, 7.18)
data.frame(ID,SampleSection,lnCort)

Some individuals have multiple "lnCort" values for "First" and "Second" SampleSections. If there are multiple measures for one individual within First or Second, I'd like to take the average of that, then create two different plots: lnCort Base on the x and lnCort [First or Second] on the y... I can figure out the regression from there but I am having difficulty figuring out how to take the average of the data in the way I described to plot against the baseline values. Any help would be greatly appreciated!


Solution

  • It seems you have two different questions -- how to aggregate multiple observations, and how to make your plots.

    First, to average across multiple observations, group by ID and SampleSection and summarize using the mean of lnCort.

    library(dplyr)
    
    df_aggregated <- df_orig %>% 
      group_by(ID, SampleSection) %>% 
      summarize(lnCort = mean(lnCort), .groups = "drop")
    
    df_aggregated
    
    #> # A tibble: 9 × 3
    #>   ID    SampleSection lnCort
    #>   <chr> <chr>          <dbl>
    #> 1 A     Base            7.26
    #> 2 A     First           7.68
    #> 3 A     Second          7.76
    #> 4 B     Base            7.95
    #> 5 B     First           7.02
    #> 6 B     Second          7.81
    #> 7 C     Base            7.75
    #> 8 C     First           7.58
    #> 9 C     Second          7.80
    

    You can then use this dataset in your regression, e.g., lm(lnCort ~ ordered(SampleSection), data = df_aggregated).

    Next, one way to approach the plots is to pivot your data wider, then map Base to x and First or Second to y in separate plots:

    library(tidyr)
    library(ggplot2)
    
    df_aggr_wide <- df_aggregated %>% 
      pivot_wider(names_from = SampleSection, values_from = lnCort)
    
    ggplot(df_aggr_wide, aes(Base, First)) +
      geom_point() +
      geom_smooth(method = lm)
    
    ggplot(df_aggr_wide, aes(Base, Second)) +
      geom_point() +
      geom_smooth(method = lm)
    

    Created on 2022-10-27 with reprex v2.0.2