Search code examples
rdplyrmicrosoft365r

use dplyr to get list items from dataframe in R


I have a dataframe being returned from Microsoft365R:

SKA_student <- structure(list(name = "Computing SKA 2021-22.xlsx", size = 22266L, 
             lastModifiedBy = 
               structure(list(user = 
                      structure(list(email = "my@email.com", 
                                     id = "8ae50289-d7af-4779-91dc-e4638421f422", 
                                     displayName = "Name, My"), class = "data.frame", row.names = c(NA, -1L))), 
                      class = "data.frame", row.names = c(NA, -1L)), 
             fileSystemInfo = structure(list(
               createdDateTime = "2021-09-08T16:03:38Z", 
               lastModifiedDateTime = "2021-09-16T00:09:04Z"), class = "data.frame", row.names = c(NA,-1L))), row.names = c(NA, -1L), class = "data.frame")

I can return all the lastModifiedBy data through:

SKA_student %>% select(lastModifiedBy)

lastModifiedBy.user.email               lastModifiedBy.user.id lastModifiedBy.user.displayName
1              my@email.com 8ae50289-d7af-4779-91dc-e4638421f422                        Name, My

But if I want a specific item in the lastModifiedBy list, it doesn't work, e.g.:

SKA_student %>% select(lastModifiedBy.user.email)

Error: Can't subset columns that don't exist.
x Column `lastModifiedBy.user.email` doesn't exist.

I can get this working through base, but would really like a dplyr answer


Solution

  • This function allows you to flatten all the list columns (I found this ages ago on SO but can't find the original post for credit)

    SO_flat_cols <- function(data) {
        ListCols <- sapply(data, is.list)
        cbind(data[!ListCols], t(apply(data[ListCols], 1, unlist)))
    }
    

    Then you can select as you like.

    SO_flat_cols (SKA_student) %>%
      select(lastModifiedBy.user.email)
    

    Alternatively you can get to the end by recursively pulling the lists

    SKA_student %>%
      pull(lastModifiedBy) %>%
      pull(user) %>%
      select(email)