Search code examples
rlapply

Retaining Original List Element Names in Loop-Based Function Application in R


I'm encountering an issue with preserving the original names of list elements when applying a function within a loop in R. Specifically, I have a function that relies on both the value and the name of the list element. While calling the function directly yields the desired results, using it within a loop (e.g., with lapply) leads to loss of the original element names. How can I ensure that the function retains the correct names of the list elements when applied in a loop?

This first part of the question includes a

Minimal reprex

# function definition

test_function <- 
    \(x) {
        list(deparse(substitute(x)),
             x)
    }

# minimal example input

arg_name <- "value"

test_function(arg_name)

[[1]]
[1] "arg_name"

[[2]]
[1] "value"

I need to loop-apply the function to a list:

# example list

my_list <- list(element_1 = arg_name, element_2 = arg_name))

lapply(my_list, test_function)

i expected lapply(...) to return:

[
$element_1
$element_1[[1]]
[1] "element_1"

$element_1[[2]]
[1] "value"


$element_2
$element_2[[1]]
[1] "element_2"

$element_2[[2]]
[1] "value"

But I am getting this instead:


$element_1
$element_1[[1]]
[1] "X[[i]]"

$element_1[[2]]
[1] "value"


$element_2
$element_2[[1]]
[1] "X[[i]]"

$element_2[[2]]
[1] "value"

I suspect lapply passes its elements without their original names to its FUN argument, the elements are renamed internally, and the FUN call is similar to (but not quite equal to) this:

`X[[i]]` <- X[[i]]
FUN(`X[[i]]`)

Is there a way to use the actual names of the input list elements without losing the information inside looping functions?

Actual scenario

The actual problem I am working on is a bit more complex, and it is datailed bellow, although the reprex above should reproduce the issue.

data

list_of_2x2_tables <- list(
    Arthritis = with(vcd::Arthritis, table(Sex, improved = Improved != "None")),
    Toothgrowth = with(ToothGrowth, table(supp, dose = dose >= 1)))

individual_table_from_the_Global_env <-
    list_of_2x2_tables[[1]]

function

compare_chisq_tests <- function(x) {

    # dataset_name
    dataset <- deparse(substitute(x))
    
    # extract dimnames to use as scores argument downstream
    names <- names(dimnames(x))
    
    # transform scores into proper named list of sequences
    scores_argument <-
        list()
    for (i in names) {
        scores_argument[[i]] <- 1:2
    }
    
    # create two chisq_test objects, one base chisq, the other with ordered elements
    a <- coin::chisq_test(x)
    b <- coin::chisq_test(x, scores = scores_argument)
    output <- dplyr::tibble(
        dataset = rep(dataset,
                      2),
        test = c(slot(a, "method"),
                 slot(b, "method")),
        p.value = c(coin::pvalue(a),
                    coin::pvalue(b))
    )
    output
}

expected output

#with single element
compare_chisq_tests(individual_table_from_the_Global_env)

# A tibble: 2 × 3
  dataset                              test                              p.value
  <chr>                                <chr>                               <dbl>
1 individual_table_from_the_Global_env Pearson Chi-Squared Test                1
2 individual_table_from_the_Global_env Linear-by-Linear Association Test       1

# with lapply
lapply(list_of_2x2_tables, compare_chisq_tests)

$Arthritis
# A tibble: 2 × 3
  dataset    test                              p.value
  <chr>      <chr>                               <dbl>
1 Arthritis  Pearson Chi-Squared Test           0.0317
2 Arthritis  Linear-by-Linear Association Test  0.0317

$Toothgrowth
# A tibble: 2 × 3
  dataset      test                              p.value
  <chr>        <chr>                               <dbl>
1 Toothgrowth  Pearson Chi-Squared Test                1
2 Toothgrowth  Linear-by-Linear Association Test       1

# see that the variable `dataset` reflects the name of the dataset used in the iteration

Actual output

$Arthritis
# A tibble: 2 × 3
  dataset test                              p.value
  <chr>   <chr>                               <dbl>
1 X[[i]]  Pearson Chi-Squared Test           0.0317
2 X[[i]]  Linear-by-Linear Association Test  0.0317

$Toothgrowth
# A tibble: 2 × 3
  dataset test                              p.value
  <chr>   <chr>                               <dbl>
1 X[[i]]  Pearson Chi-Squared Test                1
2 X[[i]]  Linear-by-Linear Association Test       1

# `dataset` is always the constant `X[[1]]`.

Solution

  • Some thoughts.

    If we start with a named list,

    obj <- setNames(list(arg_name, arg_name), c("arg_name", "arg_name"))
    obj
    # $arg_name
    # [1] "value"
    # $arg_name
    # [1] "value"
    

    then we can use Map instead of lapply with a two-arg function:

    test_function <- \(x, nm) { list(nm, x) }
    Map(test_function, obj, names(obj))
    # $arg_name
    # $arg_name[[1]]
    # [1] "arg_name"
    # $arg_name[[2]]
    # [1] "value"
    # $arg_name
    # $arg_name[[1]]
    # [1] "arg_name"
    # $arg_name[[2]]
    # [1] "value"
    
    purrr::imap(obj, ~ test_function(.x, .y))
    ### same output
    

    (you can unname(..) if you don't want the top-level names).

    Or ... you can keep the 1-arg function and apply the names after-the-fact:

    test_function <- \(x) { list(x); }
    lapply(obj, test_function) |>
      Map(names(obj), f = c)
    # $arg_name
    # $arg_name[[1]]
    # [1] "value"
    # $arg_name[[2]]
    # [1] "arg_name"
    # $arg_name
    # $arg_name[[1]]
    # [1] "value"
    # $arg_name[[2]]
    # [1] "arg_name"
    

    (albeit reversed, easily fixed with a trivial wrapper around my naive c).