Search code examples
pythonrdataframenetworkxnested-lists

How to extract specific items from a nested list and append to new column?


I have a dataframe, that has a column which contains nested lists. I am struggling to get the usernames extracted from these nested lists (I am quite new to this).

Dummy data:

myNestedList <- list("1" = list('username' = "test",
                              "uninteresting data" = "uninteresting content"),
                     "2" = list('username' = "test2",
                                "uninteresting data" = "uninteresting content"))
Column1 <- c("A","B","C")
column2 <- c("a","b","c")
mydf <- data.frame(Column1, column2)
mydf$nestedlist <- list(myNestedList)

I would like to extract all usernames for each row and append them to a new column, if there is more than one username for a row, the second/third/n-th username should just be appended with a seperating ",". I have tried something like sapply(mydf$nestedlist, [[, 1) but this just gives me one list of the entire column "nestedlist".

For context: I am trying to build a directed graph for further use in Networkx or Gephi. The data in column1 are the nodes and the usernames are mentions, hence edges. If there is another way of doing this, without extracting the usernames from the nested list, this could also be a solution.

Thanks in advance for any help! :)


Solution

  • If we know the nested level, can use map_depth

    library(purrr)
     mydf$username <- map_depth(mydf$nestedlist, 2, pluck, "username")
    

    -output

    > mydf
      Column1 column2                                                nestedlist    username
    1       A       a test, uninteresting content, test2, uninteresting content test, test2
    2       B       b test, uninteresting content, test2, uninteresting content test, test2
    3       C       c test, uninteresting content, test2, uninteresting content test, test2
    

    Or if it is not known, then apply with a recursive function with a condition check to find the 'username'

    library(rrapply)
    mydf$username <- rrapply(mydf$nestedlist,  
        condition = function(x, .xname) .xname %in% 'username', how = 'prune')
    > mydf
      Column1 column2                                                nestedlist    username
    1       A       a test, uninteresting content, test2, uninteresting content test, test2
    2       B       b test, uninteresting content, test2, uninteresting content test, test2
    3       C       c test, uninteresting content, test2, uninteresting content test, test2
    

    If we want to paste them, use

    library(stringr)
    library(dplyr)
    mydf$username <- rrapply(mydf$nestedlist,  
        condition = function(x, .xname) .xname %in% 'username',
              how = 'bind') %>% 
            invoke(str_c, sep=", ", .)
     mydf
      Column1 column2                                                nestedlist    username
    1       A       a test, uninteresting content, test2, uninteresting content test, test2
    2       B       b test, uninteresting content, test2, uninteresting content test, test2
    3       C       c test, uninteresting content, test2, uninteresting content test, test2
    

    -structure

    > str(mydf)
    'data.frame':   3 obs. of  4 variables:
     $ Column1   : chr  "A" "B" "C"
     $ column2   : chr  "a" "b" "c"
     $ nestedlist:List of 3
      ..$ :List of 2
      .. ..$ 1:List of 2
      .. .. ..$ username          : chr "test"
      .. .. ..$ uninteresting data: chr "uninteresting content"
      .. ..$ 2:List of 2
      .. .. ..$ username          : chr "test2"
      .. .. ..$ uninteresting data: chr "uninteresting content"
      ..$ :List of 2
      .. ..$ 1:List of 2
      .. .. ..$ username          : chr "test"
      .. .. ..$ uninteresting data: chr "uninteresting content"
      .. ..$ 2:List of 2
      .. .. ..$ username          : chr "test2"
      .. .. ..$ uninteresting data: chr "uninteresting content"
      ..$ :List of 2
      .. ..$ 1:List of 2
      .. .. ..$ username          : chr "test"
      .. .. ..$ uninteresting data: chr "uninteresting content"
      .. ..$ 2:List of 2
      .. .. ..$ username          : chr "test2"
      .. .. ..$ uninteresting data: chr "uninteresting content"
     $ username  : chr  "test, test2" "test, test2" "test, test2"