Search code examples
algorithmmachine-learningpcaranking

How do you use assign weights to features in a weighted composite score?


I am trying to implement a new vendor ranking system for an online marketplace website. What I want to do, is to sort vendors on something like a composite score from the highest to the lowest. At the moment, I am thinking of just using a linear model for calculating the score, kind of like

score = w1 * f1 + w2 * f2 + w3 * f3....

where f1, f2, .... are the different features (eg. avg review score, order cancel rate, response rates, etc) and w1, w2 ... are the corresponding weight for those features.

I want to score vendors from 0-100 for each item and sorting the items based on this score.

What I am having trouble with is finding a way to assign optimal weights to each feature. Is there a way to assign weights so as to optimize for something like lets say, probability of a user to make a purchase, or something more intangible like quality? After some googling, I found some papers that show using PCA for creating some composite index. But since I am not too familiar with PCA, I am not entirely sure if it is suited to this case.

I would be really grateful if someone could kindly guide me in the right path. If I am approaching this problem the entirely wrong way, I would appreciate it if someone could point that out as well.


Solution

  • This seems like your cookie cutter supervised learning problem. Depending on if you have enough labelled data you can apply something easy for training like linear regression (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) or something more complex like boosting (http://xgboost.readthedocs.io/en/latest/python/python_intro.html). Labels for your data could be how often the user bought something, making this a regression problem.