Quite simply, I need to store time series data in a document. I have decided that having a document responsible for a 30 minute period of data is reasonable. The document could look like this:
But this is only one of about a few hundred/thousand documents that will be updated every second.
{
_id: "APAC.tky001.cpu.2011.12.04:10:00",
field1: XX,
field2: YY,
1322971800: 22,
1322971801: 23,
1322971802: 21,
// and so on
}
This means that every 30 minutes, I create the document with _id
, field1
and field2
. Then, every second I would like to add a timestamp/value combination.
I am using the mongo c library, I was assuming it would be superfast but the way I am doing this requires an mongo_update
which cannot be done in bulk. I don't think there's a way to use mongo_insert_batch
.
Unfortunately, it's super slow - terrible performance. Am I doing this completely incorrectly? By terrible, I mean that by doing some crude work I get 600/second, in an alternate db (not naming names) I get 27,000/sec.
The code is approximately:
for (i=0;i<N;i++) {
if (mongo_update(c,n,a,b,MONGO_UPDATE_UPSERT,write_concern) != MONGO_OK)
// stuff
}
setting write concern off or on makes no difference.
Your updates are likely to grow documents out of bounds each time. This means that update is no longer cheap, because mongo has to copy the document to a new location. You could manually pad documents by inserting some large dummy value when creating the document and removing it later, so that your updates happen in-place. I'm not sure if you can manipulate collection-level paddingFactor directly.
In that another unnamed database you probably insert a row per entry, which is totally different operation from what you are doing here.