Search code examples
mongodbreal-timeanalytics

How can I efficiently use MongoDB to create real-time analytics with pivots?


So I'm getting a ton of data continuously that's getting put into a processedData collection. The data looks like:

{
    date: "2011-12-4",
    time: 2243,
    gender: {
        males: 1231,
        females: 322
    },
    age: 32
}

So I'll get lots and lots of data objects like this continually. I want to be able to see all "males" that are above 40 years old. This is not an efficient query it seems because of the sheer size of the data.

Any tips?


Solution

  • Generally speaking, you can't.

    However, there may be some shortcuts, depending on actual requirements. Do you want to count 'males above 40' across all dataset, or just one day?

    1 day: split your data into daily collections (processedData-20111121, ...), this will help your queries. Also you can cache results of such query.

    whole dataset: pre-aggregate data. That is, upon insertion of new data entry, do something like this:

    db.preaggregated.update({_id : 'male_40'},
         {$set : {gender : 'm', age : 40}, $inc : {count : 1231}},
         true);
    

    Similarly, if you know all your queries beforehand, you can just precalculate them (and not keep raw data).

    It also depends on how you define "real-time" and how big a query load you will have. In some cases it is ok to just fire ad-hoc map-reduces.