Search code examples
rfor-looptibble

R: Save Each For-Loop Iteration As New Named List inside List


I have a for-loop that extracts equal-length numerical ranges and appends each range as a new list within a list. After the for-loop completes I want to turn the list of lists into a data frame and calculate the row means. I currently cannot convert the list of lists to a tibble, because the individual lists are not named.

How can I name each of the new lists as the integer associated with the loop iteration? (i.e. 1, 2, 3 ...)

Simplified example below. My actual for loop does something more complicated, but it captures the issue I'm having.

Thanks!

library(tidyverse)

input_df <- data.frame(
  Name = c("A", "B", "C"),
  Start = c(100,250,350),
  Stop = c(150,300,400)
  )

All_list <- list()

for(i in 1:length(input_df$Name)) {
  
  myoutput <- data.frame(
    Value=seq.int(paste0(input_df$Start[i]), paste0(input_df$Stop[i]))
  )

#How do I specify that the name of added list is the value of the iteration [i]?
  All_list[[i]] <- myoutput$Value
}

All_df <- All_list %>%
  as_tibble() %>%
  rowwise() %>%
  dplyr::mutate(Average = mean(c_across(1:length(myvector)))) %>%
  ungroup()

Error in `as_tibble()`:
! Columns 1, 2, and 3 must be named.
Use `.name_repair` to specify repair.

Solution

  • As you're loading tidyverse, you can rewrite your loop using purrr:pmap() to map over each row simultaneously. We can rename() the columns to from and to so they are provided to seq.int() by pmap():

    out_df  <- input_df |>
        rename(from = Start, to = Stop) |>
        pmap(seq.int) |>
        set_names(input_df$Name)     |>
        as_tibble()
    
    head(out_df, 3)
    # # A tibble: 3 × 3
    #       A     B     C
    #   <int> <int> <int>
    # 1   100   250   350
    # 2   101   251   351
    # 3   102   252   352 
    

    The output is a named list identical to the one created by your loop, where we can set_names() to the Name column of input_df and make into a tibble().

    I think you're then trying to calculate the average by row. You can use the base rowMeans() function for this:

    out_df   %>% 
        mutate(Average = rowMeans(.))
    # # A tibble: 51 × 4
    #        A     B     C Average
    #    <int> <int> <int>   <dbl>
    #  1   100   250   350    233.
    #  2   101   251   351    234.
    #  3   102   252   352    235.
    

    Incidentally you can get these means in a one-liner in base R:

    rowMeans(apply(input_df, 1, \(row) seq(row["Start"], row["Stop"])))
    # Same output as Average column
    

    If you must use a loop

    In response to your comment, if you have to use a loop, simply set the names when you create the All_list variable, prior to your loop. So rather than All_list <- list(), you can do:

    All_list  <- vector(mode = "list", length = nrow(input_df))  |>
        setNames(input_df$Name)
    

    It is also better to create a list of the length you will need, avoiding the well-known performance bottleneck of growing a vector in R. See p.12 of The R Inferno. (Things have improved a bit since then but it's still more efficient to create a list of the required length when it is known in advance.)