Search code examples
rmongodbrmongodb

Running advanced MongoDB queries in R with rmongodb


As MySQL is driving me nuts I'm trying to make myself acquainted with my first "NoSQL" DBMS and it happened to be MongoDB. I'm connecting to it via rmongodb.

The more I play around with rmongodb, the more questions/problems come up with respect to running advanced queries.

First I present some example data before I go into detail about the different types of queries that I can't seem to specify correctly.

Example Data

The example is taken from the MongoDB website and has been simplified a bit.

pkg <- "rmongodb"
if (!require(pkg, character.only=TRUE)) {
    install.packages(pkg)
    require(pkg, character.only=TRUE)   
}

# Connect to DB
db <- "test"
ns <- "posts"
mongo <- mongo.create(db=db)

# Insert document to collection 'test.users'
b <- mongo.bson.from.list(list(
    "_id"="alex", 
    name=list(first="Alex", last="Benisson"),
    karma=1.0,
    age=30,
    test=c("a", "b")
))
mongo.insert(mongo, "test.users", b)

# Insert document to collection 'test.posts'
b <- mongo.bson.from.list(list(
        "_id"="abcd",
        when=mongo.timestamp.create(strptime("2011-09-19 02:00:00",
            "%Y-%m-%d %H:%M:%s"), increment=1),
        author="alex",
        title="Some title",
        text="Some text.",
        tags=c("tag.1", "tag.2"),
        votes=5,
        voters=c("jane", "joe", "spencer", "phyllis", "li"),
        comments=list(
            list(
                who="jane", 
                when=mongo.timestamp.create(strptime("2011-09-19 04:00:00",
                    "%Y-%m-%d %H:%M:%s"), increment=1),
                comment="Some comment."
            ),
            list(
                who="meghan", 
                when=mongo.timestamp.create(strptime("2011-09-20 13:00:00",
                    "%Y-%m-%d %H:%M:%s"), increment=1),
                comment="Some comment."
            )
        )
    )
)
b
mongo.insert(mongo, "test.posts", b)

Two questions related to inserting JSON/BSON objects:

  1. Document 'test.posts', field voters: is it correct to use c() in this case?
  2. Document 'test.posts', field comments: what's the right way to specify this, c() or list()?

Top Level Queries: they work a treat

Top level queries work just fine:

# Get all posts by 'alex' (only titles)
res <- mongo.find(mongo, "test.posts", query=list(author="alex"), 
    fields=list(title=1L))
out <- NULL
while (mongo.cursor.next(res))
    out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res))))

> out
[[1]]
                       _id                      title 
                     "abcd"            "No Free Lunch" 

Question 1: Basic Sub Level Queries

How can run a simple "sub level queries" (as opposed to top level queries) that need to reach into arbitrarily deep sublevels of a JSON/BSON style MongoDB object? These sub level queries make use of MongoDB's dot notation and I can't seem to figure out how to map that to a valid rmongodb query

In plain MongoDB syntax, something like

> db.posts.find( { comments.who : "meghan" } )

would work. But I can't figure out how to do that with rmongodb functions

Here's what I tried so far

# Get all comments by 'meghan' from 'test.posts'

#--------------------
# Approach 1)
#--------------------
res <- mongo.find(mongo, "test.posts", query=list(comments=list(who="meghan")))
out <- NULL
while (mongo.cursor.next(res))
    out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res))))

> out
NULL
# Does not work

#--------------------
# Approach 2) 
#--------------------
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "comments")
mongo.bson.buffer.append(buf, "who", "meghan")
mongo.bson.buffer.finish.object(buf)
query <- mongo.bson.from.buffer(buf)
res <- mongo.find(mongo, "test.posts", query=query)
out <- NULL
while (mongo.cursor.next(res))
    out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res))))

> out
NULL
# Does not work

Question 2: Queries Using $ Operators

These work

Query 1

buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "age")
mongo.bson.buffer.append(buf, "$lte", 30)
mongo.bson.buffer.finish.object(buf)
criteria <- mongo.bson.from.buffer(buf)
criteria

> mongo.find.one(mongo, "test.users", query=criteria)
    _id : 2      alex
    name : 3     
        first : 2    Alex
        last : 2     Benisson

    karma : 1    1.000000
    age : 1      30.000000
    test : 4     
        0 : 2    a
        1 : 2    b

Query 2

buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "test")
mongo.bson.buffer.append(buf, "$in", c("a", "z"))
mongo.bson.buffer.finish.object(buf)
criteria <- mongo.bson.from.buffer(buf)
criteria
mongo.find.one(mongo, "test.users", query=criteria)

However, notice that an atomic set will result in a return value of NULL

mongo.bson.buffer.append(buf, "$in", "a")
# Instead of 'mongo.bson.buffer.append(buf, "$in", c("a", "z"))'

Trying the same with sub level queries I'm lost again

buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "name")
mongo.bson.buffer.start.object(buf, "first")
mongo.bson.buffer.append(buf, "$in", c("Alex", "Horst"))
mongo.bson.buffer.finish.object(buf)
mongo.bson.buffer.finish.object(buf)
criteria <- mongo.bson.from.buffer(buf)
criteria <- mongo.bson.from.buffer(buf)
> criteria
    name : 3     
        first : 3    
            $in : 4      
                0 : 2    Alex
                1 : 2    Horst

> mongo.find.one(mongo, "test.users", query=criteria)
NULL

Solution

  • Either c() or list() can be ok. Depends on whether the components are named and whether they all have the same type (for list). Best thing to do is look at the generated BSON and see if you are getting what you want. For the best control of the generated object use mongo.bson.buffer and the functions that operate on it. In fact this is why the sub-queries are failing. 'comments' is being created as a subobject rather than an array. mongo.bson.from.list() is handy but it doesn't give you the same control and sometimes it guesses wrong about what to generate from complicated structures.

    The query on the other set of data can be corrected like so though:

    buf <- mongo.bson.buffer.create()
    mongo.bson.buffer.start.object(buf, "name.first")
    mongo.bson.buffer.append(buf, "$in", c("Alex", "Horst"))
    mongo.bson.buffer.finish.object(buf)
    criteria <- mongo.bson.from.buffer(buf)
    

    Note that you definitely need to use a buffer here since R will choke on the dotted name.

    I hope this straightens out your problem. Let me know if you have any further questions.