Search code examples
rgroup-bydplyrpurrrsummarization

Summing Multiple Groups of Columns


I have a situation where my data frame contains the results of image analysis where the columns are the proportion of a particular class present in the image, such that an example dataframe class_df would look like:

id    A    B    C    D    E    F
 1 0.20 0.30 0.10 0.15 0.25 0.00 
 2 0.05 0.10 0.05 0.30 0.10 0.40
 3 0.10 0.10 0.10 0.20 0.20 0.30

Each of these classes belongs to a functional group and I want to create new columns where the proportions of each functional group are calculated from the classes. An example mapping class_fg

class         fg
    A          Z
    B          Z
    C          Z
    D          Y
    E          Y
    F          X

and the desired result would be (line added to show the desired new columns):

id    A    B    C    D    E    F |    X    Y    Z
 1 0.20 0.30 0.10 0.15 0.25 0.00 | 0.00 0.40 0.60
 2 0.05 0.10 0.05 0.30 0.10 0.40 | 0.40 0.40 0.20
 3 0.10 0.10 0.10 0.20 0.20 0.30 | 0.30 0.40 0.30

And I can do it one functional group at a time using

first_fg <- class_fg %>%
  filter(fg == "Z") %>%
  select(class) %>%
  unlist()

class_df <- class_df %>%
  mutate(Z = rowSums(select(., one_of(first_fg))))

Surely there is a better way to do this where I can calculate the row sum for each functional group without having to just repeat this code for each group? Maybe using purrr?


Solution

  • We could split the 'class_df' by 'class', loop through the list elements with map, select the columns of 'class_df' and get the rowSums

    library(tidyverse)
    class_fg %>%
        split(.$fg) %>% 
        map_df(~ class_df %>%
                    select(one_of(.x$class)) %>% 
                    rowSums) %>%
        bind_cols(class_df, .)
    #  id    A   B    C    D    E   F   X   Y   Z
    #1  1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6
    #2  2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2
    #3  3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3
    

    Or do a group by nesting, and then do the rowSums by mapping over the list elements

    class_fg %>% 
       group_by(fg) %>%
       nest %>%
       mutate(out = map(data, ~  class_df %>%
                                   select(one_of(.x$class)) %>% 
                                   rowSums)) %>% 
       select(-data)  %>%
       unnest %>% 
       unstack(., out ~ fg) %>% 
       bind_cols(class_df, .)