Search code examples
javascriptmongodbmeteorminimongo

Dynamic Frequency Map from MongoDB Keys


I'm using MiniMongo through Meteor, and I'm trying to create a frequency table based off of a dynamic set of queries.

I have two main fields, localHour and localDay. I expect many overlaps, and I'd like to determine where the most overlaps occur. My current method of doing this is so.

if(TempStats.findOne({
            localHour: hours,
            localDay: day
          })){//checks if there is already some entry on the same day/hour

            TempStats.update({//if so, we just increment frequency
              localHour: hours,
              localDay: day
            },{
              $inc: {freq: 1}
            })

          } else {//if nothing exists yet, we put in a new entry

            TempStats.insert({
              localHour: hours,
              localDay: day,
              freq: 1
            });

          }

Essentially, this code runs every time I have new data I want to insert. It works fine at the moment, in that, after all data is inserted, I can sort by frequency to find what set of hours & days occurs the most often (TempStats.find({}, {sort: {freq: -1}}).fetch()).

However, I'm looking more for a way to search by frequency for any key. For instance, searching for the day which everything occurs on the most often as opposed to both the date and hour. With my current way of doing this, I would need to have multiple databases and different methods of inserting for each, which is a bit ridiculous. Is there a Mongo (specifically MiniMongo) solution to do frequency maps based on keys?

Thanks!


Solution

  • This is basically a simple map-reduce problem.

    First, don't separate the derived data into 2 fields. This violates DB best practices. If the data comes to you this way, use it to create a Date object. I assume you have a bunch of collections that are being subscribed to and then you aggregate all those into this temporary local collection. This is the mapping of the map-reduce pattern. At this point, since your query in unknown, it's a waste of CPU (even though it's your client) to aggregate. Map first, reduce second. What you should have is a collection full of datetimes. call it TempMapCollection if you wish. Now, use a forEach() and pass in your reduce function (by day, by hour, etc).

    You can reduce into another local collection, or into a javascript object. I like using collections, but if the objects are complex, you'll get EJSON errors all up in there. Since your objects are nothing more than a datetime, let's use collections.

    so you've got something like:

    TempMapCollection.find().forEach(function(doc) {
        var date = doc.dateTime.getDate();
        TempReduceCollection.upsert({timequery: hours}, {$inc: {freq: 1}});
    })
    

    Now query your reduce collection. This has the added benefit that you won't have to re-map if you want to do 2 unique queries.