I have a dataframe, that has a column which contains nested lists. I am struggling to get the usernames extracted from these nested lists (I am quite new to this).
Dummy data:
myNestedList <- list("1" = list('username' = "test",
"uninteresting data" = "uninteresting content"),
"2" = list('username' = "test2",
"uninteresting data" = "uninteresting content"))
Column1 <- c("A","B","C")
column2 <- c("a","b","c")
mydf <- data.frame(Column1, column2)
mydf$nestedlist <- list(myNestedList)
I would like to extract all usernames for each row and append them to a new column, if there is more than one username for a row, the second/third/n-th username should just be appended with a seperating ",".
I have tried something like sapply(mydf$nestedlist,
[[, 1)
but this just gives me one list of the entire column "nestedlist".
For context: I am trying to build a directed graph for further use in Networkx or Gephi. The data in column1 are the nodes and the usernames are mentions, hence edges. If there is another way of doing this, without extracting the usernames from the nested list, this could also be a solution.
Thanks in advance for any help! :)
If we know the nested level, can use map_depth
library(purrr)
mydf$username <- map_depth(mydf$nestedlist, 2, pluck, "username")
-output
> mydf
Column1 column2 nestedlist username
1 A a test, uninteresting content, test2, uninteresting content test, test2
2 B b test, uninteresting content, test2, uninteresting content test, test2
3 C c test, uninteresting content, test2, uninteresting content test, test2
Or if it is not known, then apply with a recursive function with a condition
check to find the 'username'
library(rrapply)
mydf$username <- rrapply(mydf$nestedlist,
condition = function(x, .xname) .xname %in% 'username', how = 'prune')
> mydf
Column1 column2 nestedlist username
1 A a test, uninteresting content, test2, uninteresting content test, test2
2 B b test, uninteresting content, test2, uninteresting content test, test2
3 C c test, uninteresting content, test2, uninteresting content test, test2
If we want to paste
them, use
library(stringr)
library(dplyr)
mydf$username <- rrapply(mydf$nestedlist,
condition = function(x, .xname) .xname %in% 'username',
how = 'bind') %>%
invoke(str_c, sep=", ", .)
mydf
Column1 column2 nestedlist username
1 A a test, uninteresting content, test2, uninteresting content test, test2
2 B b test, uninteresting content, test2, uninteresting content test, test2
3 C c test, uninteresting content, test2, uninteresting content test, test2
-structure
> str(mydf)
'data.frame': 3 obs. of 4 variables:
$ Column1 : chr "A" "B" "C"
$ column2 : chr "a" "b" "c"
$ nestedlist:List of 3
..$ :List of 2
.. ..$ 1:List of 2
.. .. ..$ username : chr "test"
.. .. ..$ uninteresting data: chr "uninteresting content"
.. ..$ 2:List of 2
.. .. ..$ username : chr "test2"
.. .. ..$ uninteresting data: chr "uninteresting content"
..$ :List of 2
.. ..$ 1:List of 2
.. .. ..$ username : chr "test"
.. .. ..$ uninteresting data: chr "uninteresting content"
.. ..$ 2:List of 2
.. .. ..$ username : chr "test2"
.. .. ..$ uninteresting data: chr "uninteresting content"
..$ :List of 2
.. ..$ 1:List of 2
.. .. ..$ username : chr "test"
.. .. ..$ uninteresting data: chr "uninteresting content"
.. ..$ 2:List of 2
.. .. ..$ username : chr "test2"
.. .. ..$ uninteresting data: chr "uninteresting content"
$ username : chr "test, test2" "test, test2" "test, test2"