Search code examples
cassandramahoutmahout-recommender

Cassandra based Mahout user friend recommendations


I want to recommend a user , a list of users which the current user can add as friends.

I am using Cassandra and mahout. there is already a implementation of CassandraDataModel in mahout integration package. I want to use this class.

So my recommend-er class looks like follows

public class UserFriendsRecommender {

@Inject
private CassandraDataModel dataModel;

public List<RecommendedItem> recommend(Long userId, int number) throws TasteException{
    UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(dataModel);
    // Optional:
    userSimilarity.setPreferenceInferrer(new AveragingPreferenceInferrer(dataModel));

    UserNeighborhood neighborhood =
              new NearestNUserNeighborhood(3, userSimilarity, dataModel);
    Recommender recommender = new GenericUserBasedRecommender(dataModel, neighborhood, userSimilarity); 
    Recommender cachingRecommender = new CachingRecommender(recommender);
    List<RecommendedItem> recommendations = cachingRecommender.recommend(userId, number);
    return recommendations;
}

}

CassandraDataModel has 4 column familys

static final String USERS_CF = "users";
  static final String ITEMS_CF = "items";
  static final String USER_IDS_CF = "userIDs";
  static final String ITEM_IDS_CF = "itemIDs";

i have a hard time understanding this class especially the column family's. is there any example where i can look for or if someone can explain will be great with a small example.?

javadoc says this

* <p>
 * First, it uses a column family called "users". This is keyed by the user ID
 * as an 8-byte long. It contains a column for every preference the user
 * expresses. The column name is item ID, again as an 8-byte long, and value is
 * a floating point value represnted as an IEEE 32-bit floating poitn value.
 * </p>
 * 
 * <p>
 * It uses an analogous column family called "items" for the same data, but
 * keyed by item ID rather than user ID. In this column family, column names are
 * user IDs instead.
 * </p>
 * 
 * <p>
 * It uses a column family called "userIDs" as well, with an identical schema.
 * It has one row under key 0. It contains a column for every user ID in the
 * model. It has no values.
 * </p>
 * 
 * <p>
 * Finally it also uses an analogous column family "itemIDs" containing item
 * IDs.
 * </p>

Solution

  • All the following instructions about required column families by CassandraDataMdoel should be performed in cassandra-cli under the keyspace you created (recommender or other name).

    1: Table users

    userID is the row key, each itemID has a separate column name, and value is the preference:

    CREATE COLUMN FAMILY users
    WITH comparator = LongType
    AND key_validation_class=LongType
    AND default_validation_class=FloatType;
    

    Insert values:

    set users[0][0]='1.0';
    set users[1][0]='3.0';
    set users[2][2]='1.0';
    

    2: Table items

    itemID is the row key, each userID has a separate column name, and value is the preference:

    CREATE COLUMN FAMILY items
    WITH comparator = LongType
    AND key_validation_class=LongType
    AND default_validation_class=FloatType;
    

    Insert Values:

    set items[0][0]='1.0';
    set items[0][1]='3.0';
    set items[2][2]='1.0';
    

    3: Table userIDs

    This table just has one row, but many columns, i.e. each userID has a separate column:

    CREATE COLUMN FAMILY userIDs
    WITH comparator = LongType
    AND key_validation_class=LongType;
    

    Insert Values:

    set userIDs[0][0]='';
    set userIDs[0][1]='';
    set userIDs[0][2]='';
    

    4: Table itemIDs:

    This table just has one row, but many columns, i.e. each itemID has a separate column:

    CREATE COLUMN FAMILY itemIDs
    WITH comparator = LongType
    AND key_validation_class=LongType;
    

    Insert Values:

    set itemIDs[0][0]='';
    set itemIDs[0][1]='';
    set itemIDs[0][2]='';