Search code examples
rmetadata

Obtain row means for columns in same group using metadata file


I have been trying to figure out a more efficient way to summarize rowMeans for various samples belonging to the same group. I couldn't find a good post to what I would think would be a common problem.

I'd like to have something more automated using a metadata file but can't figure it out. Any guidance would be greatly appreciated!

file looks like this:

Target_Name Target_position Sample_1 Sample_2 Sample_3 Sample_4
Target-1 1 100 105 54 55
Target-1 2 120 125 57 58

metadata looks like this:

group sample
Control Sample_1
Control Sample_2
Test Sample_3
Test Sample_4

My Expected output looks like this:

Target_Name Target_position Sample_1 Sample_2 Sample_3 Sample_4 Control_Mean Test_Mean
Target-1 1 100 105 54 55 102.5 54.5
Target-1 2 120 125 57 58 122.5 57.5

I am able to generate the expected output using

file$Control_Mean <- rowMeans(file[ , c(3,4)], na.rm=FALSE)
file$Test_Mean <- rowMeans(file[ , c(5,6)], na.rm=FALSE)

But I was wondering if there is a more automated approach, as the group names in my metadata file will be different all of the time and not just called Control or Test. Any suggestions? Thanks!


Solution

  • Here is a solution using dplyr and tidyr

    Packaged

    library(dplyr)
    library(tidyr)
    

    Input Data

    # Data
    file <- tibble::tribble(
      ~Target_Name, ~Target_position, ~Sample_1, ~Sample_2, ~Sample_3, ~Sample_4,
        "Target-1",               1L,      100L,      105L,       54L,       55L,
        "Target-1",               2L,      120L,      125L,       57L,       58L
      )
    
    
    metadata <- tibble::tribble(
         ~group,    ~sample,
      "Control", "Sample_1",
      "Control", "Sample_2",
         "Test", "Sample_3",
         "Test", "Sample_4"
      )
    
    

    Solution

    file %>% 
      pivot_longer(Sample_1:Sample_4, names_to = "sample", values_to = "value") %>% 
      left_join(metadata) %>% 
      group_by(Target_position, group) %>% 
      summarise(mean = mean(value)) %>% 
      ungroup() %>% 
      pivot_wider(id_cols = Target_position, names_from = "group", values_from = "mean",
                  names_prefix = "Mean_") %>% 
      left_join(file) %>% 
      select(Target_Name, Target_position, Sample_1:Sample_4, starts_with("Mean"))
    
    

    Output

    ## A tibble: 2 × 8
    #  Target_Name Target_position Sample_1 Sample_2 Sample_3 Sample_4 Mean_Control Mean_Test
    #  <chr>                 <int>    <int>    <int>    <int>    <int>        <dbl>     <dbl>
    #1 Target-1                  1      100      105       54       55         102.5     54.5
    #
    #2 Target-1                  2      120      125       57       58         122.5     57.5
    

    Created on 2022-12-08 with reprex v2.0.2