Search code examples
mahoutmahout-recommender

Order of Apache Mahout User-Based Recommender Results is non-deterministic


Currently, I am implementing the user-based recommender system from Mahout, see http://mahout.apache.org/users/recommender/userbased-5-minutes.html

Initially, I thought I can implement some kind of pagination: Users query for the first page and get the first N items, they query for the second page and get the next N items, and so on. Since Mahout does not provide such a functionality, I wanted to work around by querying for N items on page 1, for 2*N items on page 2, etc, and then just return the items that are correct for the queried page.

However, when I run the recommender with for example 10 items, and in the next query I run the recommender with 20 items, then the order of the returned list is different (which makes it impossible for me to paginate). How is that possible? Shouldn't it return the same results when queried with the same data?

Note: The underlying data hasn't changed.


Solution

  • There is a random process in most recommenders to downsample the data that is used to calculate the model so that it can be O(n) complexity. You can supply an RNG seed value that is fixed if you want the downsampling to be deterministic. How you do this depends on which packaging of the recommender you are using.

    Are you using the in-memory version, the Hadoop version, or the Spark version + search engine?

    The latest Mahout recommender code is fully integrated into event ingest, model calculation, and realtime serving with this version Here the RNG seed is in the configuration file engine.json.