Search code examples
pythonapache-sparkpysparkcollaborative-filteringapache-spark-mllib

multiple features in collaborative filtering- spark


I have a CSV file that looks like:

customer_ID, location, ....other info..., item-bought, score

I am trying to build a collaborative filtering recommender in Spark. Spark takes data of the form:

userID, itemID, value

but my data is longer, I want all user's info to be used instead of just userID. I tried grouping the columns in one column as:

(customerID,location,....),itemID,score

but the ALS.train gives me this error:

TypeError: int() argument must be a string or a number, not 'tuple'

How can I let spark take multiple key/values and not only three columns? thanks


Solution

  • For each customer, identify the columns which you would like to use to distinguish these user-entities. Create a table (e.g. in SQL) in which each row contains the information for one user-entity, and use the row number in this table as the userID.

    Do the same for your items if necessary, and provide these IDs to your classifier.