Search code examples
javascriptforerunnerdb

How can I select the latest version of an object from a ForerunnerDb collection


I have a collection which contains a series of objects generated over time. Since I have disparate types stored in the same collection, I have a TypeId and a UID per object (where the UID identifies objects that refer to the same entity over time). I am trying to choose the most recent object from the collection, and running into serious difficulties grasping how to do so without manually iterating a query result - something I'd rather avoid since I think it could become expensive when the collection gets larger.

For example:

var db; // assigned elsewhere
var col = db.collection("res");
col.primaryKey("resId");
col.insert({
    resId: 1,
    TypeId: "Person",
    UID: "Bob",
    Data: {Age: 20, Name:Bob}
});
col.insert({
    resId: 2,
    TypeId: "Person",
    UID: "Bob",
    Data: {Age: 25, Name:Bob}
});
col.insert({
    resId: 3,
    TypeId: "Car",
    UID: "TeslaModelX",
    Data: {Manufacturer: "Tesla", Owner:"Bob"}
});
col.insert({
    resId: 4,
    TypeId: "Person",
    UID: "Bill",
    Data: {Age: 22, Name:Bill}
});

From col, I want the query to select all objects with TypeId="Person" ranked by resId descending, i.e. I'd expect to select objects 4 and 2, in that order.

The collection above is contrived, but in reality I'd expect there to be certainly 000s of entries and potentially 0000s, with maybe 00s of versions of each UID. In other words, I'd rather not return the full collection of objects, grouped or otherwise, and iterate it.

I have tried the following, but since the $distinct operator is applied before the $orderBy one, this doesn't help:

col.find(
    {
        TypeId : {$eq : "Person"}
        $distinct: { UID: 1}
    },
    {
        $orderBy: {
            resId : -1
        }
    }
    );

I have in mind that I should be able to use the $groupBy, $limit and $aggregate clauses to identify the per group desired IDs, and then use a subquery to find the precise (non-aggregated) elements, but as yet I haven't managed to get anything to do what I want. Any ideas?

My current solution is to include a Deleted property amongst my objects, and set it to true for all existing non-deleted objects in the DB before I insert new entries. This lets me do what I want but also stops me from, for instance, choosing the best available within a known timeframe or similar.


Solution

  • You can do this like:

    var tmpObj = {};
    col.sort({resId: -1}, coll.find({
        "TypeId": "Person"
    })).filter(function (doc) {
        return col._match(doc, {
            $distinct: {
                UID: 1
            }
        }, {}, 'and', tmpObj);
    });
    

    It's a bit dirty since it's not neatly wrapped up in a single command, but it's as clean as you'll get it in ForerunnerDB v1.x.

    Version 2 will have a new query pipeline system that would allow for exactly this sort of usage, something like this:

    col.pipeline()
        .find({"TypeId": "Person"})
        .orderBy({"resId": 1})
        .distinct({"UID": 1})
        .then(function (err, data) {
            console.log(data);
        });
    

    Source: I'm the developer of ForerunnerDB.