I have a system that accepts status updates from a variety of unique sources, and each status update creates a new document in the following structure:
{
"type": "status_update",
"source_id": "truck1231",
"timestamp": 13023123123,
"location": "Boise, ID"
}
Data purely example, but gets the idea across.
Now, these documents are generated at interval, once an hour or so. An hour later, we might the insert:
{
"type": "status_update",
"source_id": "truck1231",
"timestamp": 13023126723,
"location": "Madison, WI"
}
All I'm interested in doing is seeing the latest update from each unique source. I'm currently doing this by taking a map of:
function(doc) {
if (doc.type == "status_update") {
emit(doc.source_id, doc);
}
}
And a reduce of:
function(keys, values, rereduce) {
var winner = values[0];
var i = values.length;
while (i--) {
var val = values[i];
if (val.timestamp > winner.timestamp) winner = val;
}
return winner;
}
And querying the data as a reduce with group=true
. This works as expected and provides a keyed result of only the latest updates.
The problem is it's terribly slow and requires me to reduce_limit=false
in the CouchDB config.
It feels like there has to be a more efficient way of doing this. Updating the same document is not an option-- the history is important even though I don't require it in this instance. Processing the data client-side isn't an option either as this is a CouchApp and the amount of documents in the system is actually quite large and not practical to send them all over the wire.
Thanks in advance.
You can get the latest timestamp for every source using the _stats
built-in reduce function, then do another query to get the documents. Here's the views:
"views": {
"latest_update": {
"map": "function(doc) { if (doc.type == 'status_update') emit(doc.source_id, doc.timestamp); }",
"reduce": "_stats"
},
"status_update": {
"map": "function(doc) { if (doc.type == 'status_update') emit([doc.source_id, doc.timestamp], 1); }"
}
}
First query latest_update
with group=true
, then status_update
with something like (properly url-encoded):
keys=[["truck123",TS123],["truck234",TS234],...]&include_docs=true
where TS123, and TS234 are the values of max
returned by latest_update
.