Search code examples
rgroup-byaveragegeom-point

adding average line group by column in r


hi i have data that looks like this :

enter image description here

38 columns total . 10 types of treatments in treatment column and dates from 25-29 in date column data sample code ( example for 2 treatment types but the data has 10 types ):

df <- structure(
    list(
      Christensenellaceae = c(
        0,
        0.009910731,
        0.010131195,
        0.009679938,
        0.01147601,
        0.010484508,
        0.008641566,
        0.010017172,
        0.010741488,
        0.1,
        0.2,
        0.3,
        0.4),
      Date=c(25,25,25,25,25,27,27,27,27,27,27,27,27),
      Treatment = c(
        "Original Sample",
        "Original Sample",
        "Original Sample",
        "Original Sample",
        "Original Sample"
        "Treatment 1",
        "Treatment 1",
        "Treatment 1",
        "Treatment 1",
        "Treatment 2",
        "Treatment 2",
        "Treatment 2",
        "Treatment  2")
    ),class = "data.frame",
    row.names = c(NA,-9L)
  )

what i whish to do is to create 2 plots for every column one for Original treatment and other for the all the treatment types (1-10) in example here (1-2) and add mean lines of the observations based on every treatmment type . in total the treatment plot should have 10 average lines ( here 2) . sadly i dont understand how to add the line grouped by the treatment types here is my code for the one line based on all treatment types . how can i add the line grouped by treatment type :

df_3 %>% 
  pivot_longer(-treatment) %>% 
  mutate(plot = ifelse(str_detect(treatment, "Original"), 
                       "Original sample", 
                       "Treatment"),
         treatment = str_extract(treatment, "\\d+$")) %>% 
  group_by(name) %>% 
  group_split() %>% 
  map(~.x %>% ggplot(aes(x = factor(treatment), y = value, color = factor(name))) +
        geom_point() +
        stat_summary(aes(y = value,group=1), fun.y=mean, colour="red", geom="line",group=1)
        +
        facet_wrap(~plot, scales = "free_x") +
        labs(x = "Treatment", y = "Value", color = "Taxa") +
        guides(x =  guide_axis(angle = 90))+
        theme_bw()) 

enter image description here as you can see there is only one mean line and i need 10 ( here 2 ) for every treatment type . is there any way to edit my code so it will work ? thank you:)

Also I tried this code but I didn't seem to work

      df %>% 
     pivot_longer(-c(Treatment, Date), names_to = "taxon") 
      %>% mutate( type = Treatment %>% str_detect("Original") 
      %>% ifelse("Original", "Treatment"), treatment_nr = Treatment 
       %>% str_extract("(?<=Treatment )[0-9]+") )
         %>% ggplot(aes(Date, value, color = treatment_nr)) + 
           geom_point() + stat_summary( geom = "point", fun.y = 
           "mean", size = 3, shape = 24 ) + geom_line() + facet_grid(type 
            ~ taxon, scales = "free_y") #> Warning: `fun.y` is deprecated. 
                Use `fun` instead. 

Solution

  • Your data wasn’t formatted properly and didn’t match up with your original sample code (e.g. Treatment instead of treatment). I’m going to generate some data here instead anyway for the purpose of illustrating the solution based on the data in your image.

    library(tidyverse)
    set.seed(1)
    df <-
      data.frame(
        Christensenellaceae = runif(105),
        treatment = rep(c("Original Sample_25", 
                          paste0("Treatment", 1:10, "_", 27), 
                          paste0("Treatment", 1:10, "_", 28)), 
                        each = 5)
      )
    

    Because you’re generating the mean as a line it will connect on the x-axis. I’ve done a pretty lazy work around using a segment and calculating the mean prior to the plot. Depending how it looks with your ten treatments, you can change the size of the average line by changing avg_line_length.

    Because there are additional x-axis values with the segment (e.g. 0.65, 1.35), the x-axis would default include those additional values. I’ve created labels and breaks to address that and I've used the intermediate data labs_df for that. I’ve left the original blank. You could play around with color/linetype to display the line as 'Mean' in the legend as well.

    avg_line_length <- 0.35
    
    p <-
      df %>% 
        pivot_longer(-treatment) %>% 
        mutate(plot = ifelse(str_detect(treatment, "Original"), 
                             "Original sample", 
                             "Treatment"),
               treatment = as.numeric(str_extract(treatment, "\\d+")),
               treatment_label = ifelse(plot %in% "Original sample", "", treatment)) %>% 
        {. ->> lab_df} %>%
        group_by(treatment) %>%
        mutate(avg = mean(value),
               xstart = treatment - avg_line_length,
               xend = treatment + avg_line_length) %>%
        ungroup() %>%
        group_by(name) %>%
        group_split() %>% 
        map(~.x %>% ggplot() +
              geom_point(aes(x = treatment, y = value, color = name)) +
              geom_segment(aes(x = xstart, xend = xend, y = avg, yend = avg, color = name)) +
              scale_x_continuous(breaks = lab_df$treatment, labels = lab_df$treatment_label) +
              facet_wrap(~plot, scales = "free_x") +
              labs(x = "Treatment", y = "Value", color = "Taxa") +
              guides(x =  guide_axis(angle = 90))+
              theme_bw()) 
    
    p
    #> [[1]]
    

    And if you don’t want the average line for the original sample, just an additional ifelse.

    p2 <-
      df %>% 
        pivot_longer(-treatment) %>% 
        mutate(plot = ifelse(str_detect(treatment, "Original"), 
                             "Original sample", 
                             "Treatment"),
               treatment = as.numeric(str_extract(treatment, "\\d+")),
               treatment_label = ifelse(plot %in% "Original sample", "", treatment)) %>% 
        {. ->> lab_df} %>%
        group_by(treatment) %>%
        mutate(avg = ifelse(plot %in% "Original sample", NA, mean(value)),
               xstart = treatment - avg_line_length,
               xend = treatment + avg_line_length) %>%
        ungroup() %>%
        group_by(name) %>%
        group_split() %>% 
        map(~.x %>% ggplot() +
              geom_point(aes(x = treatment, y = value, color = factor(name))) +
              geom_segment(aes(x = xstart, xend = xend, y = avg, yend = avg), colour="red") +
              scale_x_continuous(breaks = lab_df$treatment, labels = lab_df$treatment_label) +
              facet_wrap(~plot, scales = "free_x") +
              labs(x = "Treatment", y = "Value", color = "Taxa") +
              guides(x =  guide_axis(angle = 90))+
              theme_bw()) 
    
    p2
    #> [[1]]
    #> Warning: Removed 5 rows containing missing values (geom_segment).
    

    It's messy, but hopefully that addresses your problem.