Search code examples
rapply

How to add a new column to the output within an rapply function?


In a nested list with three levels, I need to reach the third level (i.e., resources) and check whether some special characters exist in the data. I use the code below to do it:

obj1 <- list(resource = list(bodyPart = c("leg", "arm", "knee"),side = c("LEFT", "RIGHT", "LEFT"), device = c("LLI", "LSM", "GHT"), id = c("AA", "BB", "CC")) %>%  as.data.frame(), cat = list(lab = c("aa", "bb", "cc"), id = c(23, 24, 25)) %>%  as.data.frame())
obj2 <- list(resource = list(bodyPart = c("leg", "arm", "knee"), side = c("LEFT", "LEFT", "LEFT"), device = c("GOM", "LSM", "YYY"), id = c("ZZ", "DD", "FF")) %>%  as.data.frame())

x <- list(foo = c(fer = "wdb", obj1), bar = obj2)


library(tibble)

data.frame(has_invalid_character = x |> rapply(f = \(node) grepl("[^\x01-\x7F]", node)),
           content =  x |> rapply(f = \(node) node)
           ) 

Now I need to add the information included in the id column to the row names so that I can identify where each case happens in my data. for instance, instead of foo.resource.bodyPart1, I need to have foo.resource.bodyPart.AA. Similarly, foo.resource.bodyPart2, should be replaced by foo.resource.bodyPart.BB Could you please advise me on it?

sample output:

                             has_invalid_character content
foo.fer                                   FALSE     wdb
foo.resource.bodyPart1.AA                 FALSE     leg
foo.resource.bodyPart2.BB                 FALSE     arm
foo.resource.bodyPart3.CC                 FALSE    knee
foo.resource.side1.AA                     FALSE    LEFT
foo.resource.side2.BB                     FALSE   RIGHT
foo.resource.side3.CC                     FALSE    LEFT
foo.cat.lab1.23                           FALSE     aa
foo.cat.lab1.24                           FALSE     bb
foo.cat.lab1.25                           FALSE     cc

Solution

  • based on your example data:

    
    library(purrr)
    library(tibble)
    
    ## ------------------ see (1)
    modify_tree(x, leaf = \(node) {
      if(is.data.frame(node) && !is.null(node$id)){
        ids <- node$id
        lapply(node, FUN = \(col) setNames(col, ids))
      } else {
        setNames(node, 'NA')
      }
    }) |> 
    ## ------------------ see (2)
      unlist() |>
      data.frame() |>
      setNames(nm = 'content') |>
      rownames_to_column('parent') |>
      rowwise() |>
      mutate(frags = strsplit(parent, '\\.'),
             parent = paste(rev(rev(frags)[-1]),collapse =  '.'),
             id = tail(frags, 1)
             ) |>
      select(-frags) |>
      ## filter out the content of the initial id columns:
      filter(!grepl('\\.id', parent)) |>
      ## do the actual scan for unwanted characters:
      mutate(has_invalid_char = grepl('[^\x01-\x7F]', content))
    

    (1) this one operates recursively down the nested lists, checking whether the node at hand is a dataframe and bears an ID-column (named "id"). If so, it digests the dataframe by returning each column as a vector, named with the ID-column. Caution: currently ignores dataframes without ID-column. If the node is not an ID-ed dataframe, it is returned as a vector with 'NA'-character as ID.

    (2) routing reshaping and processing, finally running the character check

    # A tibble: 22 x 4
    # Rowwise: 
       parent                content id    has_invalid_char
       <chr>                 <chr>   <chr> <lgl>           
     1 foo.fer               wdb     NA    FALSE           
     2 foo.resource.bodyPart leg     AA    FALSE           
     3 foo.resource.bodyPart arm     BB    FALSE           
     4 foo.resource.bodyPart knee    CC    FALSE           
     5 foo.resource.side     LEFT    AA    FALSE           
     6 foo.resource.side     RIGHT   BB    FALSE           
     7 foo.resource.side     LEFT    CC    FALSE           
     8 foo.resource.device   LLI     AA    FALSE           
     9 foo.resource.device   LSM     BB    FALSE           
    10 foo.resource.device   GHT     CC    FALSE           
    # i 12 more rows