Search code examples
mahout

how to improve memory usage when using GenericItemSimilarity in mahout(taste)


As we known, in genericItemSimilarity similarity between item1 and item2 is precomputed.

when we use GenericItemBasedRecommender to get recommendation,the recommender need datamodel and similarity in memory at the same time.According to the genericItemSimilarity,it offers a construction like this

  public GenericItemSimilarity(ItemSimilarity otherSimilarity, DataModel dataModel) throws TasteException {
        long[] itemIDs = GenericUserSimilarity.longIteratorToList(dataModel.getItemIDs());
        initSimilarityMaps(new DataModelSimilaritiesIterator(otherSimilarity, itemIDs));
  }

just use dataModel to generate Similarity Maps in time .

Is it necessary to store the similarity maps to Db/file ?

I find mahout 0.7 have a class named FileItemItemSimilarityIterator can be helpful to read similarity maps from file.

is the FileItemItemSimilarityIterator or AbstractJDBCInMemoryItemSimilarity(mahout 0.5) redundancy or helpless.


Solution

  • You don't have to put the similarities in memory at all if they can be re-computed quickly on the fly.

    If not, I suggest you simply prune similarities that have small absolute value. These affect the computation the least.