Search code examples
rlistloopsvectorsublist

R: loop over list of lists to retrieve headers of sublists that contain a hit


I have a list of lists in R. Each sublist in the list of lists contains multiple elements. These sublists do not necessarily all have the same length. All sublists have a specific header name. Like this:

#create list of lists
vector1 = c("apple","banana","cherry")
vector2 = c("banana","date","fig")
vector3 = c("fig","jackfruit","mango","plum")
listoflists  = list(vector1 , vector2, vector3)
names(listoflists) = c("listA", "listB", "listC")

The list of lists looks like this:

listoflists

$listA
[1] "apple"  "banana" "cherry"

$listB
[1] "banana" "date"   "fig"   

$listC
[1] "fig"       "jackfruit" "mango"     "plum"     

Next, I have a vector that contains elements that can also be found within the sublists. Like this:

wanted = c("apple","banana","fig")
wanted
[1] "apple"  "banana" "fig" 

For each element in the vector wanted I want to extract the header names of each sublist in the list of lists that contains this particular element. For the here presented example the output should look something like this:

#desired output
apple  listA
banana listA listB
fig    listB listC

I thought about putting this into a for loop to obtain something like this:

output_list = list()
for (i in wanted){
  output = EXTRACT LIST HEADER WHEN i IS PRESENT IN SUBLIST
  output_list[[i]] = output
}

However, it is not clear whether I can, and if yes how to, loop over the list of lists to extract header names of only those sublists that contain the element in the vector wanted. I looked into using the unlist function but that did not seem to be useful for this problem. I looked on stackoverflow, as well as other forums but could not find any question outlining a similar problem. It would thus be really helpful if someone can point me into the right direction to solve this issue.

Thanks already!


Solution

  • There are multiple ways to get the output.

    1) An option is to loop over the 'listoflists', subset the vector based on the 'wanted' values, stack it to a two column data.frame and split into a list again by 'values'

    with(stack(lapply(listoflists, function(x) 
         x[x %in% wanted])), split(as.character(ind), values))
    #$apple
    #[1] "listA"
    
    #$banana
    #[1] "listA" "listB"
    
    #$fig
    #[1] "listB" "listC"
    

    2) or we can stack first to a two column 'data.frame', then subset the rows, and split

    with(subset(stack(listoflists), values %in% wanted), 
               split(as.character(ind), values))
    #$apple
    #[1] "listA"
    
    #$banana
    #[1] "listA" "listB"
    
    #$fig
    #[1] "listB" "listC"
    

    3)) Or another option is to loop over the 'wanted' and get the names of the 'listoflists' based on a match

    setNames(lapply(wanted, function(x) 
       names(which(sapply(listoflists, function(y) x %in% y)))), wanted)
    #$apple
    #[1] "listA"
    
    #$banana
    #[1] "listA" "listB"
    
    #$fig
    #[1] "listB" "listC"