Search code examples
rdplyracross

How can I change columns with mutate(across()) due to a specific RegEx?


I have a problem with the mutate(across()) function. In the tibble you can see below, I want to delete the "letter + underscores" (e.g. "p__", "c__" etc) in the columns.

A tibble: 2,477 x 4
   Phylum                Class                   Order               Family                 
   <chr>                 <chr>                   <chr>               <chr>                  
 1 " p__Proteobacteria"  " c__Gammaproteobacter~ " o__Aeromonadales" " f__Aeromonadaceae"   
 2 " p__Bacteroidota"    " c__Bacteroidia"       " o__Bacteroidales" " f__Williamwhitmaniac~
 3 " p__Fusobacteriota"  " c__Fusobacteriia"     " o__Fusobacterial~ " f__Leptotrichiaceae" 
 4 " p__Firmicutes"      " c__Clostridia"        " o__Clostridiales" " f__Clostridiaceae"   
 5 " p__Proteobacteria"  " c__Gammaproteobacter~ " o__Enterobactera~ " f__Enterobacteriacea~
 6 " p__Bacteroidota"    " c__Bacteroidia"       " o__Bacteroidales" " f__Williamwhitmaniac~
 7 " p__Firmicutes"      " c__Clostridia"        " o__Lachnospirale~ " f__Lachnospiraceae"  
 8 " p__Bacteroidota"    " c__Bacteroidia"       " o__Cytophagales"  " f__Spirosomaceae"    
 9 " p__Proteobacteria"  " c__Gammaproteobacter~ " o__Burkholderial~ " f__Comamonadaceae"   
10 " p__Actinobacteriot~ " c__Actinobacteria"    " o__Frankiales"    " f__Sporichthyaceae"  
# ... with 2,467 more rows

A year ago I used the command

table <- table %>% 
  mutate_at(vars(Phylum, Class, Order, Family),funs(sub(pattern = "^([a-z])(_{2})", replacement = "", .)))

Now, it gives me the hint that the funs-function is not longer supported and it does not work anymore. Do you have some suggestions for me? I thought about:

taxon <- c("Phylum", "Class", "Order", "Family")
table <- table %>% 
  mutate(across(taxon), gsub(pattern = "^([a-z])(_{2})", replacement = "", .))

But here I get the error:

Error: Invalid index: out of bounds

Thanks a lot :) Kathrin


Solution

  • You can do :

    library(dplyr)
    
    taxon <- c("Phylum", "Class", "Order", "Family")
    table <- table %>%  mutate(across(taxon, 
              ~gsub(pattern = "^([a-z])(_{2})", replacement = "", .)))
    

    I don't have your data to confirm this but there seems to be a whitespace at the beginning of the string which should be removed first.

    table <- table %>%  mutate(across(taxon, 
               ~gsub(pattern = "^([a-z])(_{2})", replacement = "", trimws(.))))