Search code examples
rlistdirectory-structure

R - how to get a multi-level list using list.files


I have a folder structure like this:

- ConditionA
     - Subcondition1
          - data1.Rds
          - data2.Rds
     - Subcondition2
          - data1.Rds
          - data2.Rds
- ConditionB
     - Subcondition1
          - data1.Rds
          - data2.Rds
     - Subcondition2
          - data1.Rds
          - data2.Rds

Using the list.files(recursive = T, full.names = T) gives the following:

"./ConditionA/Subcondition1/data1.Rds"
"./ConditionA/Subcondition1/data2.Rds"
"./ConditionA/Subcondition2/data1.Rds"
"./ConditionA/Subcondition2/data2.Rds"
"./ConditionB/Subcondition1/data1.Rds"
"./ConditionB/Subcondition1/data2.Rds"
"./ConditionB/Subcondition2/data1.Rds"
"./ConditionB/Subcondition2/data2.Rds"

However, what I want instead is a list of lists representing the nested folder structure. The list should be identical to this one I will construct manually here:

sublist1 <- list("data1.Rds", "data2.Rds")
sublist2 <- list("data1.Rds", "data2.Rds")
sublist3 <- list("data1.Rds", "data2.Rds")
sublist4 <- list("data1.Rds", "data2.Rds")

sublist5 <- list(sublist1, sublist2)
names(sublist5) <- c("Condition1", "Condition2")

sublist6 <- list(sublist3, sublist4)
names(sublist6) <- c("Condition1", "Condition2")

final_list <- list(sublist5, sublist6)
names(final_list) <- c("ConditionA", "ConditionB")

Let's see:

final_list

Gives the output:

$ConditionA
$ConditionA$Condition1
$ConditionA$Condition1[[1]]
[1] "data1.Rds"

$ConditionA$Condition1[[2]]
[1] "data2.Rds"


$ConditionA$Condition2
$ConditionA$Condition2[[1]]
[1] "data1.Rds"

$ConditionA$Condition2[[2]]
[1] "data2.Rds"



$ConditionB
$ConditionB$Condition1
$ConditionB$Condition1[[1]]
[1] "data1.Rds"

$ConditionB$Condition1[[2]]
[1] "data2.Rds"


$ConditionB$Condition2
$ConditionB$Condition2[[1]]
[1] "data1.Rds"

$ConditionB$Condition2[[2]]
[1] "data2.Rds"

How can I achieve this to be automated instead of constructing the list manually?


Solution

  • A fun exercise is to do this with a recursive function.

    fun <- function(L) {
      len1 <- lengths(L) == 1
      c(
        L[len1],
        if (any(!len1)) lapply(
          split(lapply(L[!len1], `[`, -1), sapply(L[!len1], `[[`, 1)),
          fun)
      )
    }
    

    Using a similar tree hierarchy:

    list.files(recursive = TRUE, full.names = TRUE)
    # [1] "./ConditionA/Subcondition1/data1.Rds" "./ConditionA/Subcondition1/data2.Rds" "./ConditionA/Subcondition2/data1.Rds"
    # [4] "./ConditionA/Subcondition2/data2.Rds" "./ConditionB/Subcondition1/data1.Rds" "./ConditionB/Subcondition1/data2.Rds"
    # [7] "./ConditionB/Subcondition2/data1.Rds" "./ConditionB/Subcondition2/data2.Rds"
    

    We can do this:

    list.files(recursive = TRUE, full.names = TRUE) |>
      sub("^\\./", "", x = _) |>
      # optional? just stripping the leading "./"
      strsplit("/") |>
      fun() |>
      str()
    # List of 2
    #  $ ConditionA:List of 2
    #   ..$ Subcondition1:List of 2
    #   .. ..$ : chr "data1.Rds"
    #   .. ..$ : chr "data2.Rds"
    #   ..$ Subcondition2:List of 2
    #   .. ..$ : chr "data1.Rds"
    #   .. ..$ : chr "data2.Rds"
    #  $ ConditionB:List of 2
    #   ..$ Subcondition1:List of 2
    #   .. ..$ : chr "data1.Rds"
    #   .. ..$ : chr "data2.Rds"
    #   ..$ Subcondition2:List of 2
    #   .. ..$ : chr "data1.Rds"
    #   .. ..$ : chr "data2.Rds"
    

    (The same output, but compactified with str() for presentation here.)