Search code examples
rstr-replacedplyracross

str_replace within mutate(across()) matching nth character from cur_column


A summary of my aim

I have the following dataframe structure:

my.df <-data.frame("col1_A.C"=c("AA","AC","CC"),
                   "col2_A.T"=c("TT","AT","TT"),
                   "col3_C.G"=c("GG","CG","CG"))

my.df
#   col1_A.C col2_A.T col1_C.G
# 1       AA       TT       GG
# 2       AC       AT       CG
# 3       CC       TT       CG

For each column, I want to replace any character that matches the 3rd last character of the column name with the character "R".

Using the above dataframe I thus would like to obtain this:

my.df2 <- data.frame("col1_A.C"=c("RR","RC","CC"),
                   "col2_A.T"=c("TT","RT","TT"),
                   "col3_C.G"=c("GG","RG","RG"))

my.df2
#   col1_A.C col2_A.T col1_C.G
# 1       RR       TT       GG
# 2       RC       RT       RG
# 3       CC       TT       RG

In the first column for instance the column name is col1_A.C, and A is the 3rd last character. All the A's were thus replaced with an R.

My code so far

To achieve this, I have produced the following code

my.df2 <- my.df %>% mutate(across(.cols=everything(),
                                  .funs=str_replace_all(.,
                                                        substr(cur_column(),
                                                               nchar(cur_column()-2),
                                                               nchar(cur_column()-2)
                                                              ),
                                                        "R")
                                  )
                           )

Unfortunately, the resulting dataframe, my.df2, looks exactly like my.df and no character replacement occurred. No error is returned although.

I have tested the str_replace_all() approach in the following way and it works on a vector. I imagine then there is something I am missing/not understanding in the way str_replace_all() is interpreted within the mutate(across()) function.

first.column <- c("CC","CT","CC")

first.column <- str_replace_all(first.column,
                                substr(colnames(my.df)[1],
                                       nchar(colnames(my.df)[1])-2,
                                       nchar(colnames(my.df)[1])-2
                                       ),
                                "R")
print(first.column)
# [1] "RR" "RT" "RR"

I have ran out of ideas of what might not be working. My understanding of R and its functions is not very thorough so I apologise if I have missed something simple. I have also searched for similar questions but to no avail.


Solution

  • You can use Map :

    my.df[] <- Map(function(x, y) gsub(y, 'R', x), my.df, 
          substring(names(my.df), nchar(names(my.df)) - 2,nchar(names(my.df)) - 2))
    
    my.df
    #  col1_A.C col2_A.T col3_C.G
    31       RR       TT       GG
    #2       RC       RT       RG
    #3       CC       TT       RG
    

    Using @thelatemail's chartr trick with imap_dfc from purrr :

    purrr::imap_dfc(my.df, ~chartr(substr(.y, nchar(.y)-2, nchar(.y)-2), 'R', .x))