Search code examples
phpmongodbmapreducelithiumnosql

Map Reduce To Get Most popular tags


I have a problem that I need some help on but I feel I'm close. It involves Lithium and MongoDB Code looks like this: http://pastium.org/view/0403d3e4f560e3f790b32053c71d0f2b

$db = PopularTags::connection();

        $map = new \MongoCode("function() {
            if (!this.saved_terms) {
                return;
            }

            for (index in this.saved_terms) {
                emit(this.saved_terms[index], 1);
            }
        }");

        $reduce = new \MongoCode("function(previous, current) {
            var count = 0;
            for (index in current) {
                count += current[index];
            }
            return count;
            }");

        $metrics = $db->connection->command(array(
           'mapreduce' => 'users',
           'map' => $map,
           'reduce' => $reduce,
           'out' => 'terms'
       ));

        $cursor = $db->connection->selectCollection($metrics['result'])->find()->limit(1);
        print_r($cursor);
/**
User Data In Mongo

{
 "_id" : ObjectId("4e789f954c734cc95b000012"), 
"email" : "[email protected]", 
 "saved_terms" : [
    null,

    [
        "technology",
        " apple",
        " iphone"
    ],
    [
        "apple",
        " water",
        " beryy"
    ]
] }


**/

I am having a user savings terms they search on and then I am try to get the most populars terms but I keep getting errors like :Uncaught exception 'Exception' with message 'MongoDB::__construct( invalid name '. does anyone have any idea how to do this or some direction?


Solution

  • First off I would not store this in the user object. MongoDb objects have an upper limit of 4/16MB (depending on version). Now this limit is normally not a problem, but when logging inline in one object you might be able to reach it. However a more real problem is that every time you need to act on these objects you need to load them into RAM and it becomes consuming. I dont think you want that on your user objects.

    Secondly arrays in objects are not sortable and have other limitations that might come back to bite you later.

    But, if you want to have it like this (low volume of searches should not be a problem really) you can solve this most easy by using a group query. A group query is pretty much like a group query in sql, so its a slight trick as you need to group on something most objects share. (An active field on users maybe).

    So, heres a working group example that will sum words used based on your structure. Just put this method in your model and do MyModel::searchTermUsage() to get a Document object back.

    public static function searchTermUsage() {
        $reduce = 'function(obj, prev) {
            obj.terms.forEach(function(terms) {
                terms.forEach(function(term) {
                    if (!(term in prev)) prev[term] = 0;
                    prev[term]++;
                });
            });
        }';
        return static::all(array(
            'initial' => new \stdclass,
            'reduce' => $reduce,
            'group' => 'common-value-key' // Change this
        ));
    }
    

    There is no protection against non-array types in the terms field (you had a null value in your example). I removed it for simplicity, its better to probably strip this before it ends up in the database.