Search code examples
rmongodbrmongodb

R and MongoDB: Array is stored as an object with indexes as keys


I'm pulling in JSON data from a provider and adding it to mongodb using R. I plan on using R and Shiny to display the data in the future. I'm currently having an issue right now though where I place the data into a JSON Object and insert it into MongoDB. It adds the object but places the data one level lower than where I would really like it.

Here is how the data comes in:

prettify(jsonKill)
[
    {
        "id" : {
            "timestamp" : 1409785080,
            "machine" : 11966932,
            "pid" : 3144,
            "increment" : 11720074,
            "creationTime" : "2014-09-03T22:58:00Z"
        },
    ...
]

Here is my code that adds it to mongodb:

library('jsonlite')
library('rmongodb')

m <- mongo.create()
ns <- 'database.collection'
killObject <- fromJSON('http://omitted.because.nda:8000/api/omit')
x <- nrow(killObject)
for(i in 1:x){
  jsonKill <- toJSON(killObject[i:i,])
  bson <- mongo.bson.from.JSON(jsonKill)
  mongo.insert(m, ns, bson)
  paste("Inserting Record: ", i)
}
cursor <- mongo.find(m, ns, bson)
while(mongo.cursor.next(cursor)){
  value <- mongo.cursor.value(cursor)
  list <- mongo.bson.to.list(value)
  str(list)
}

Here is the result:

{
    "_id" : ObjectId("54081299d5ec83d046d05766"),
    "1" : {
        "id" : {
            "timestamp" : 1409756219,
            "machine" : 2364985,
            "pid" : 9076,
            "increment" : 1079972,
            "creationTime" : "2014-09-03T14:56:59Z"
        },
    ...
}

What I'm aiming for is to do db.collection.find({"id.pid" : $gt1}) or an index with mongo.index.create(m, ns, {"id.pid"}, mongo.index.unique) something to that effect, not necessarily the id key, but one or more of the keys not displayed here.


Solution

  • The reason for this is that rmongodb currently features a bug that will handicap usage of arrays.


    R:

    library(rmongodb)
    
    m <- mongo.create()
    
    json <- '{"array":[{"a":1},{"b":2}]}'
    bson <- mongo.bson.from.JSON(json)
    
    mongo.insert(m, "database.collection", bson)
    

    MongoDB shell:

    > db.collection.find().pretty()
    {
            "_id" : ObjectId("540825d68a271f234b6d62d2"),
            "array" : {
                    "1" : {
                            "a" : 1
                    },
                    "2" : {
                            "b" : 2
                    }
            }
    }
    

    For that purpose I developed a package (rmongodbHelper) that provides a workaround for that issue:

    R:

    library(devtools)
    install_github("joyofdata/rmongodbHelper")
    library(rmongodbHelper)
    
    json <- '{"array":[{"a":1},{"b":2}]}'
    bson <- rmongodbHelper::json_to_bson(json)
    
    mongo.insert(m, "database.collection", bson)
    

    MongodB shell:

    > db.collection.find().pretty()
    {
            "_id" : ObjectId("540826738a271f234b6d62d4"),
            "array" : [
                    {
                            "a" : 1
                    },
                    {
                            "b" : 2
                    }
            ]
    }
    

    You can find further information on this package and on using MongoDB with R on my web-site:

    MongoDB - State of the R


    Keep in mind that MongoDB cannot store bare arrays - only objects - which themselves may contain arrays.