Search code examples
rjsonmongodbmongolite

Mongolite does not insert dataframe with list column correctly into Mongo DB


Going to make a short, reproducible example of an issue I am having that involves inserting data from R into a mongo database. It is challenging because, as you will see, I have a nested column of data. Fixing this is pivotal to my database, and i think is a problem that others could run into as well.

My Data:

my.data <- structure(list(`_id` = c(10138L, 9466L, 9390L), firstName = c("Alex", "Quincy", "Steven"), lastName = c("Abrines", "Acy", "Adams"), 
    birthCity = c("Palma de Mallorca", "Tyler, TX", "Rotorua"
    ), birthCountry = c("Spain", "USA", "New Zealand")), row.names = c(NA, 
3L), class = "data.frame")

my.data
> nba_players
    _id firstName lastName         birthCity birthCountry
1 10138      Alex  Abrines Palma de Mallorca        Spain
2  9466    Quincy      Acy         Tyler, TX          USA
3  9390    Steven    Adams           Rotorua  New Zealand

inner.df <- structure(list(jerseyNumber = 40L, weight = 240L, age = 21L), class = "data.frame", row.names = 485L)

num.vector <- c(1,3,5,7)

My goal with the above is twofold:

  • add a 4th column to inner.df that has the num.vector
  • add inner.df as a 6th column to each row in my.data

... and here is the code that I use to do such:

# add a list of the numbers to inner df
inner.df$shotIDs = list(num.vector)  

# create allmonths column (name of the row where inner.df's will be placed)  
my.data <- my.data %>%
  dplyr::mutate(allmonths = NA)

# convert allmonths into a column of class == list
my.data$allmonths[1] = list(placeholder = NA)

# For EACH row in my main my.data dataframe, add the inner.df to the allmonths column/key
for(i in 1:nrow(my.data)) {
  my.data$allmonths[[i]] <- inner.df
}

# Write this to my mongo db
con <- mongolite::mongo(collection = 'mycoll', db = 'mydb', url = "myurl")
con$insert(my.data) # this is not a good way to update a db

Here is my result of this (showing from Robo 3T):

enter image description here ...

I am SO SO close with this, but for some reason allmonths is a length-1 array, rather than its own object. If allmonths were an object with 4 fields, with the exact same values as the object labeled [0], then this would be much better.

Does anybody see what's wrong in my attempt here. I'm sure this is a problem that others may have run into when working with nested objects in R! Any help is super appreciated!


Solution

  • To get the object { } your allmonths needs to be a column of type data.frame, not list.

    Taking your example

    library(dplyr)
        
    my.data <- structure(list(`_id` = c(10138L, 9466L, 9390L), firstName = c("Alex", "Quincy", "Steven"), lastName = c("Abrines", "Acy", "Adams"), 
                                  birthCity = c("Palma de Mallorca", "Tyler, TX", "Rotorua"
                                  ), birthCountry = c("Spain", "USA", "New Zealand")), row.names = c(NA, 
                                                                                                     3L), class = "data.frame")
        
        
    my.data
        
    inner.df <- structure(list(jerseyNumber = 40L, weight = 240L, age = 21L), class = "data.frame", row.names = 485L)
        
    num.vector <- c(1,3,5,7)
        
    # add a list of the numbers to inner df
    inner.df$shotIDs = list(num.vector)  
    

    If you now append your inner.df as a column (having to repeat it because you need 3 rows to match to your my.data)

    my.data$allmonths <- inner.df[rep(1,3), ]
    

    And then view the JSON it produces you see you get your allmonths: { } object

    substr( jsonlite::toJSON( my.data ), 1, 196 )
    # [{"_id":10138,"firstName":"Alex","lastName":"Abrines","birthCity":"Palma de Mallorca","birthCountry":"Spain",
    # "allmonths":{"jerseyNumber":40,"weight":240,"age":21,"shotIDs":[1,3,5,7],"_row":"485"}
    # } 
    
    

    Aside

    It's often helpful to construct the JSON you're after, then call fromJSON to see the R structure you should be aiming for

    js <- '
    [{"_id":10138,"firstName":"Alex","lastName":"Abrines","birthCity":"Palma de Mallorca","birthCountry":"Spain","allmonths":{"jerseyNumber":40,"weight":240,"age":21,"shotIDs":[1,3,5,7],"_row":"485"}},{"_id":9466,"firstName":"Quincy","lastName":"Acy","birthCity":"Tyler, TX","birthCountry":"USA","allmonths":{"jerseyNumber":40,"weight":240,"age":21,"shotIDs":[1,3,5,7],"_row":"485.1"}},{"_id":9390,"firstName":"Steven","lastName":"Adams","birthCity":"Rotorua","birthCountry":"New Zealand","allmonths":{"jerseyNumber":40,"weight":240,"age":21,"shotIDs":[1,3,5,7],"_row":"485.2"}}] 
    '
    str( jsonlite::fromJSON( js ) )
        
    # 'data.frame': 3 obs. of  6 variables:
    #   $ _id         : int  10138 9466 9390
    #   $ firstName   : chr  "Alex" "Quincy" "Steven"
    #   $ lastName    : chr  "Abrines" "Acy" "Adams"
    #   $ birthCity   : chr  "Palma de Mallorca" "Tyler, TX" "Rotorua"
    #   $ birthCountry: chr  "Spain" "USA" "New Zealand"
    #   $ allmonths   :'data.frame':    3 obs. of  4 variables:
    #   ..$ jerseyNumber: int  40 40 40
    #   ..$ weight      : int  240 240 240
    #   ..$ age         : int  21 21 21
    #   ..$ shotIDs     :List of 3
    #   .. ..$ : int  1 3 5 7
    #   .. ..$ : int  1 3 5 7
    #   .. ..$ : int  1 3 5 7