Search code examples
rfunctiongsubanonymous-function

Using an anonymous function within a function in R


I have here a list of genetic loci containing their alleles encoded as three digit numbers, as class character. I have a few lines of code to go through the list and convert all instances to nucleic base letters (ie. A, C, G, T).

my_allele_list = list(loc1 = c("001", "002"),
                  loc2 = c("001", "003"),
                  loc3 = c("004", "001"),
                  loc4 = c("003", "003"),
                  loc5 = c("001", "002"),
                  loc6 = c("002", "004"))

a = c("001", "002", "003", "004")
b = c("A", "C", "G", "T")
for(i in seq_along(a)) my_allele_list <- 
  lapply(my_allele_list, function(x) gsub(a[i], b[i], x))

my_allele_list

So far so good, but to keep things tidy I would like to wrap these lines into a function.

convert_alleles <- function(x){
    a = c("001", "002", "003", "004")
    b = c("A", "C", "G", "T")
    for(i in seq_along(a)) x <- 
      lapply(x, function(x) gsub(a[i], b[i], x))
    }

convert_alleles(my_allele_list)

my_allele_list

However, as you can see on this second pass the function does not work - no error, just no change made to the list object. I suspect that the issue is with a clash with the anonymous function within the for loop. Can someone explain what the issue is and suggest a solution?


Solution

  • It may be easier with a vectorized function as str_replace

    library(dplyr)
    library(purrr)
    library(stringr)
    map(my_allele_list, ~ str_replace_all(.x, setNames(b, a)))
    

    -output

    $loc1
    [1] "A" "C"
    
    $loc2
    [1] "A" "G"
    
    $loc3
    [1] "T" "A"
    
    $loc4
    [1] "G" "G"
    
    $loc5
    [1] "A" "C"
    
    $loc6
    [1] "C" "T"
    

    Also, if it is a fixed match and not a partial as in the example, use setNames to create a named vector and match and replace

    map(my_allele_list, ~ unname(setNames(b, a)[.x]))
    $loc1
    [1] "A" "C"
    
    $loc2
    [1] "A" "G"
    
    $loc3
    [1] "T" "A"
    
    $loc4
    [1] "G" "G"
    
    $loc5
    [1] "A" "C"
    
    $loc6
    [1] "C" "T"
    

    which can be also done with base R -lapply

    lapply(my_allele_list, \(x) unname(setNames(b, a)[x]))
    $loc1
    [1] "A" "C"
    
    $loc2
    [1] "A" "G"
    
    $loc3
    [1] "T" "A"
    
    $loc4
    [1] "G" "G"
    
    $loc5
    [1] "A" "C"
    
    $loc6
    [1] "C" "T"
    

    In the OP's function, the return value should be x

    convert_alleles <- function(x){
         a = c("001", "002", "003", "004")
         b = c("A", "C", "G", "T")
         for(i in seq_along(a)) x <- 
           lapply(x, function(x) gsub(a[i], b[i], x))
         x
        }
    convert_alleles(my_allele_list)
    $loc1
    [1] "A" "C"
    
    $loc2
    [1] "A" "G"
    
    $loc3
    [1] "T" "A"
    
    $loc4
    [1] "G" "G"
    
    $loc5
    [1] "A" "C"
    
    $loc6
    [1] "C" "T"
    

    NOTE: when we run the function, it wouldn't change the object my_allele_list. For that we have assign back (<-)

    my_allele_list <- convert_alleles(my_allele_list)