I have a collection which contains a series of objects generated over time. Since I have disparate types stored in the same collection, I have a TypeId
and a UID
per object (where the UID identifies objects that refer to the same entity over time). I am trying to choose the most recent object from the collection, and running into serious difficulties grasping how to do so without manually iterating a query result - something I'd rather avoid since I think it could become expensive when the collection gets larger.
For example:
var db; // assigned elsewhere
var col = db.collection("res");
col.primaryKey("resId");
col.insert({
resId: 1,
TypeId: "Person",
UID: "Bob",
Data: {Age: 20, Name:Bob}
});
col.insert({
resId: 2,
TypeId: "Person",
UID: "Bob",
Data: {Age: 25, Name:Bob}
});
col.insert({
resId: 3,
TypeId: "Car",
UID: "TeslaModelX",
Data: {Manufacturer: "Tesla", Owner:"Bob"}
});
col.insert({
resId: 4,
TypeId: "Person",
UID: "Bill",
Data: {Age: 22, Name:Bill}
});
From col, I want the query to select all objects with TypeId="Person"
ranked by resId
descending, i.e. I'd expect to select objects 4 and 2, in that order.
The collection above is contrived, but in reality I'd expect there to be certainly 000s of entries and potentially 0000s, with maybe 00s of versions of each UID. In other words, I'd rather not return the full collection of objects, grouped or otherwise, and iterate it.
I have tried the following, but since the $distinct
operator is applied before the $orderBy
one, this doesn't help:
col.find(
{
TypeId : {$eq : "Person"}
$distinct: { UID: 1}
},
{
$orderBy: {
resId : -1
}
}
);
I have in mind that I should be able to use the $groupBy
, $limit
and $aggregate
clauses to identify the per group desired IDs, and then use a subquery to find the precise (non-aggregated) elements, but as yet I haven't managed to get anything to do what I want. Any ideas?
My current solution is to include a Deleted
property amongst my objects, and set it to true for all existing non-deleted objects in the DB before I insert new entries. This lets me do what I want but also stops me from, for instance, choosing the best available within a known timeframe or similar.
You can do this like:
var tmpObj = {};
col.sort({resId: -1}, coll.find({
"TypeId": "Person"
})).filter(function (doc) {
return col._match(doc, {
$distinct: {
UID: 1
}
}, {}, 'and', tmpObj);
});
It's a bit dirty since it's not neatly wrapped up in a single command, but it's as clean as you'll get it in ForerunnerDB v1.x.
Version 2 will have a new query pipeline system that would allow for exactly this sort of usage, something like this:
col.pipeline()
.find({"TypeId": "Person"})
.orderBy({"resId": 1})
.distinct({"UID": 1})
.then(function (err, data) {
console.log(data);
});
Source: I'm the developer of ForerunnerDB.