Search code examples
data-miningcollaborative-filtering

Collaborative Filtering: Ways to determine implicit scores for products for each user?


Having implemented an algorithm to recommend products with some success, I'm now looking at ways to calculate the initial input data for this algorithm.

My objective is to calculate a score for each product that a user has some sort of history with.

The data I am currently collecting:

  • User order history
  • Product pageview history for both anonymous and registered users

All of this data is timestamped.

What I'm looking for

There are a couple of things I'm looking for suggestions on, and ideally this question should be treated more for discussion rather than aiming for a single 'right' answer.

  • Any additional data I can collect for a user that can directly imply an interest in a product
  • Algorithms/equations for turning this data into scores for each product

What I'm NOT looking for

Just to avoid this question being derailed with the wrong kind of answers, here is what I'm doing once I have this data for each user:

  • Generating a number of user clusters (21 at the moment) using the k-means clustering algorithm, using the pearsons coefficient for the distance score
  • For each user (on demand) calculating their a graph of similar users by looking for their most and least similar users within their cluster, and repeating for an arbitrary depth.
  • Calculating a score for each product based on the preferences of other users within the user's graph
  • Sorting the scores to return a list of recommendations

Basically, I'm not looking for ideas on what to do once I have the input data (I may need further help with that later, but it's not the point of this question), just for ideas on how to generate this input data in the first place


Solution

  • Here's a haymaker of a response:

    • time spent looking at a product
    • semantic interpretation of comments left about the product
    • make a discussion page about a product, brand, or product category and semantically interpret the comments
    • if they Shared a product page (email, del.icio.us, etc.)
    • browser (mobile might make them spend less time on the page vis-à-vis laptop while indicating great interest) and connection speed (affects amt. of time spent on the page)
    • facebook profile similarity
    • heatmap data (e.g. à la kissmetrics)

    What kind of products are you selling? That might help us answer you better. (Since this is an old question, I am addressing both @Andrew Ingram and anyone else who has the same question and found this thread through search.)