Search code examples
rdatatable

Reduce columns that whose names matches a pattern


I am trying to create multiple columns in my datatable that represent the sum of columns that match the given pattern by row.

df <- data.frame(first_column = c("Alpha", "Beta", "Charlie", "Tango", "Alpha, Beta,Alpha", "Alpha,Beta,Charlie", 'Tango,Tango,Tango,Tango', 'Tango,Tango,Tango, Tango', 'Tango,Tango,Tango, Tango , Alpha,Beta,Charlie, Alpha, Alpha ,Alpha '),
                 number_1 = 1:9,
                 number_2 = 11:19,
                 number_3 = 2:10,
                 number_4 = 12:20) 

testing <- df %>%
  mutate(number_1 = as.numeric(number_1),
         number_2 = as.numeric(number_2))%>%
  as.data.table 


testing3 = testing[,`:=` ("Total 1" = Reduce(`+`, grep("number_1|number_2", names(testing), value = TRUE)),
                          "Total 2" = Reduce(`+`, grep("number_3|number_4", names(testing), value = TRUE)))]

This is what I have tried to far to no avail. What I should see is Total 1 row 1, 12 and Total 2 row 1, 14

but I am unable to do so.


Solution

  • The reason is that the grep returns only the column names with value = TRUE, we need the value of the columns, Use .SD to subset the columns from the column names

    library(data.table)
    testing[,`:=` (
       "Total 1" = Reduce(`+`, .SD[, grep("number_1|number_2", names(.SD),
                  value = TRUE), with = FALSE]),
        "Total 2" = Reduce(`+`, .SD[, grep("number_3|number_4", names(.SD), 
         value = TRUE), with = FALSE]))]
    

    -output

    > testing
                                                              first_column number_1 number_2 number_3 number_4 Total 1 Total 2
                                                                    <char>    <num>    <num>    <int>    <int>   <num>   <int>
    1:                                                               Alpha        1       11        2       12      12      14
    2:                                                                Beta        2       12        3       13      14      16
    3:                                                             Charlie        3       13        4       14      16      18
    4:                                                               Tango        4       14        5       15      18      20
    5:                                                   Alpha, Beta,Alpha        5       15        6       16      20      22
    6:                                                  Alpha,Beta,Charlie        6       16        7       17      22      24
    7:                                             Tango,Tango,Tango,Tango        7       17        8       18      24      26
    8:                                            Tango,Tango,Tango, Tango        8       18        9       19      26      28
    9: Tango,Tango,Tango, Tango , Alpha,Beta,Charlie, Alpha, Alpha ,Alpha         9       19       10       20      28      30
    

    If there are multiple sets, we may also create a named list , Filter the list elements based on the occurence of names

    lst_names <- list(c("number_1", "number_2"), 
                     c("number_3", "number_4"), 
                     c("number_5", "number_6"))
    names(lst_names) <- paste("Total", seq_along(lst_names))
    lst_names_sub <- Filter(length, lapply(lst_names, function(x)
            intersect(x, names(testing))))
    testing[, names(lst_names_sub) := lapply(lst_names_sub, function(x) 
             Reduce(`+`, .SD[, x, with = FALSE]))]