Search code examples
rlistgroup-bystring-matchingsapply

Group columns in dataframe by list and mutate in R


I have a large dataframe containing binary columns. Here is a list of the column names:

[1] "imagetag_logos_position_Apple_BOTTOM_CENTER"         "imagetag_logos_position_Apple_BOTTOM_LEFT"           "imagetag_logos_position_Apple_BOTTOM_RIGHT"          "imagetag_logos_position_Apple_CENTER"                "imagetag_logos_position_Apple_CENTER_LEFT"          
  [6] "imagetag_logos_position_Apple_CENTER_RIGHT"          "imagetag_logos_position_Apple_TOP_CENTER"            "imagetag_logos_position_Apple_TOP_LEFT"              "imagetag_logos_position_Apple_TOP_RIGHT"             "imagetag_logos_position_Banana_BOTTOM_CENTER"       
 [11] "imagetag_logos_position_Banana_BOTTOM_LEFT"          "imagetag_logos_position_Banana_BOTTOM_RIGHT"         "imagetag_logos_position_Banana_CENTER_LEFT"          "imagetag_logos_position_Banana_CENTER_RIGHT"         "imagetag_logos_position_Banana_TOP_RIGHT"           
 [16] "imagetag_logos_position_Pear_BOTTOM_CENTER"      "imagetag_logos_position_Pear_BOTTOM_LEFT"        "imagetag_logos_position_Pear_BOTTOM_RIGHT"       "imagetag_logos_position_Pear_CENTER"             "imagetag_logos_position_Pear_CENTER_LEFT"       
 [21] "imagetag_logos_position_Pear_CENTER_RIGHT"       "imagetag_logos_position_Pear_TOP_RIGHT"          "imagetag_logos_position_Kiwi_BOTTOM_CENTER"    "imagetag_logos_position_Kiwi_BOTTOM_LEFT"      "imagetag_logos_position_Kiwi_BOTTOM_RIGHT"    
 [26] "imagetag_logos_position_Kiwi_CENTER_LEFT"      "imagetag_logos_position_Kiwi_CENTER_RIGHT"     "imagetag_logos_position_Kiwi_TOP_LEFT"         "Product_position_Product_0"                         "Product_position_Product_BOTTOM_CENTER"            
 [31] "Product_position_Product_BOTTOM_LEFT"               "Product_position_Product_BOTTOM_RIGHT"              "Product_position_Product_CENTER"                    "Product_position_Product_CENTER_LEFT"               "Product_position_Product_CENTER_RIGHT"             
 [36] "Product_position_Product_TOP_CENTER"                "Product_position_Product_TOP_LEFT"                  "Product_position_Product_TOP_RIGHT"                 "Person_position_Person_0"                           "Person_position_Person_BOTTOM_CENTER"              
 [41] "Person_position_Person_BOTTOM_LEFT"                 "Person_position_Person_BOTTOM_RIGHT"                "Person_position_Person_CENTER"                      "Person_position_Person_CENTER_LEFT"                 "Person_position_Person_CENTER_RIGHT"               
 [46] "Person_position_Person_TOP_CENTER"                  "Person_position_Person_TOP_LEFT"                    "Person_position_Person_TOP_RIGHT"                   "Logo_position_Logo_0"                               "Logo_position_Logo_BOTTOM_CENTER"                  
 [51] "Logo_position_Logo_BOTTOM_LEFT"                     "Logo_position_Logo_BOTTOM_RIGHT"                    "Logo_position_Logo_CENTER"                          "Logo_position_Logo_CENTER_LEFT"                     "Logo_position_Logo_CENTER_RIGHT"                   
 [56] "Logo_position_Logo_TOP_CENTER"                      "Logo_position_Logo_TOP_LEFT"                        "Logo_position_Logo_TOP_RIGHT"                       "CTA_ShopNow_position_Shop Now_0"                    "CTA_ShopNow_position_Shop Now_BOTTOM_CENTER"       
 [61] "CTA_ShopNow_position_Shop Now_BOTTOM_LEFT"          "CTA_ShopNow_position_Shop Now_BOTTOM_RIGHT"         "CTA_ShopNow_position_Shop Now_CENTER"               "CTA_ShopNow_position_Shop Now_CENTER_LEFT"          "CTA_ShopNow_position_Shop Now_CENTER_RIGHT"        
 [66] "CTA_ShopNow_position_Shop Now_TOP_CENTER"           "CTA_ShopNow_position_Shop Now_TOP_RIGHT"            "CTA_JoinNow_position_Join Now_0"                    "CTA_JoinNow_position_Join Now_BOTTOM_CENTER"        "CTA_JoinNow_position_Join Now_BOTTOM_LEFT"         
 [71] "CTA_JoinNow_position_Join Now_BOTTOM_RIGHT"         "CTA_JoinNow_position_Join Now_CENTER"               "CTA_JoinNow_position_Join Now_CENTER_RIGHT"         "CTA_JoinNow_position_Join Now_TOP_CENTER"           "CTA_JoinNow_position_Join Now_TOP_RIGHT"           
 [76] "CTA_position_CTA_0"                                 "CTA_position_CTA_BOTTOM_CENTER"                     "CTA_position_CTA_BOTTOM_LEFT"                       "CTA_position_CTA_BOTTOM_RIGHT"                      "CTA_position_CTA_CENTER"                           
 [81] "CTA_position_CTA_CENTER_LEFT"                       "CTA_position_CTA_CENTER_RIGHT"                      "CTA_position_CTA_TOP_CENTER"                        "CTA_position_CTA_TOP_LEFT"                          "CTA_position_CTA_TOP_RIGHT"                        
 [86] "Text_position_text_BOTTOM_CENTER"                   "Text_position_text_BOTTOM_LEFT"                     "Text_position_text_BOTTOM_RIGHT"                    "Text_position_text_CENTER"                          "Text_position_text_CENTER_LEFT"                    
 [91] "Text_position_text_CENTER_RIGHT"                    "Text_position_text_TOP_CENTER"                      "Text_position_text_TOP_LEFT"                        "Text_position_text_TOP_RIGHT"                       "Product_position_Product_0_LF"                     
 [96] "Product_position_Product_BOTTOM_CENTER_LF"          "Product_position_Product_BOTTOM_LEFT_LF"            "Product_position_Product_BOTTOM_RIGHT_LF"           "Product_position_Product_CENTER_LF"                 "Product_position_Product_CENTER_LEFT_LF"           
[101] "Product_position_Product_CENTER_RIGHT_LF"           "Product_position_Product_TOP_CENTER_LF"             "Product_position_Product_TOP_LEFT_LF"               "Product_position_Product_TOP_RIGHT_LF"              "Logo_position_Logo_0_LF"                           

I want to group some of these columns, for example sum the columns that contain "BOTTOM_CENTER", "BOTTOM_RIGHT", "BOTTOM_LEFT". However I need to group them within each prefix that matches, for example, only sum for imagetag_logos_position_Apple, and a separate sum for imagetag_logos_position_Banana.

I did this to create a list of the unique prefixes:

prefix_list <- str_extract(colnames(positionsdf),".+?(?=([A-Z])([A-Z])([A-Z]))")
prefix_list1 <- unique(prefix_list)
> prefix_list1
 [1] "imagetag_logos_position_Apple_"      "imagetag_logos_position_Banana_"     "imagetag_logos_position_Kiwi_"   "imagetag_logos_position_Pear_" NA                                   "Product_position_Product_"          "Person_position_Person_"           
 [8] "Logo_position_Logo_"                "CTA_ShopNow_position_Shop Now_"     "CTA_JoinNow_position_Join Now_"     "CTA_position_"                      "Text_position_text_"                "CTA_LearnMore_position_Learn More_" "Person_position_"  

I have tried different ways to get the dataframe to group by the string in the list so that I can perform the addition of columns but can not seem to figure out how to go about this. %in% will not support partial match so I am not sure what other function to use Thanks!

for(i in prefix_list1){
  sapply(positionsdf, function(x) i %in% x)
}

Solution

  • We may do

    sapply(prefix_list1, function(pat) {
            nm1 <- grep(pat, names(positions_df), value = TRUE)
            nm2 <- grep("BOTTOM_(CENTER|RIGHT|LEFT)", nm1, value = TRUE)
            rowSums(positions_df[nm2], na.rm = TRUE)
        })