Search code examples
neural-networkrecommendation-engineno-data

Training a neural network without historical data


I am building a highly personalised recommender system from scratch where I have no historical data for the interactions between users and items. Nevertheless, a user when added to the system must provide a list of tags for the items:

  1. He really likes;
  2. He has no opinion about;
  3. He dislikes

Then, based on those tags I am able to match some items for groups 1., 2. and 3.

So, I am thinking of sampling items from groups 1., 2. and 3. and assign them the target value 1, 0, and -1 respectively in order to train my neural network. After the training step I would get a neural network highly personalised for each user that would allow me to start recommending some items that match each user's preferences despite having no historical data.

Of course, as the user starts providing feedback for the recommended items I would update the network in order to match his new preferences.

With that said, does this approach makes sense or neural networks are not the best fit for this specific case?


Solution

  • First of all, you did not clearly enough explain your specific question or problem, which usually results in an answer you probably did not expect, but I'll try to give some meaningful information, rather than a plain 42.

    You did not specify, what is that you'd like you recommendation system to achieve. Now it is not clear based on what exactly you are planning to give recommendations to the user. Is that a correlation between user A preference and all other user preferences that should suggest the products, not seen by user A he might like?

    That seems to be the most likely case, based on description. So you are looking for some sort of solution to the Netflix challenge usually called collaborative filtering. Your model as described is much simpler than the data Netflix or Amazon has, but it still can not operate without any data, so initial guesses are going to be completely off and annoy users. One of my friends is being constantly annoyed by recommendations that other people who liked this movie also watched that - he says it's always wrong even though Netflix has lots of data and a comprehensive recommendation engine. So expect a lot of frustration and possibly even vandalism (as when users deliberately provide incorrect feedback because of poor quality of recommendations). The only way to avoid it is to collect data first by asking for the feedback and only give recommendation after you collected sufficient amount of samples.

    We are slowly getting to the actual question as stated: if the neural network is a good tool for the job. If you have sufficient amount of data which can fit a simple model as you described with a small number of false positives (poor recommendations) and large number of true positives (correct recommendations) it is. How much data you need depends on the number of products and strength of correlation between them being liked and disliked. If you have 2 products which has no correlation, no matter how much data you will collect there will be no good. If you got very similar products all together, correlation will be strong, but equally spread between all of the products, so again you wouldn't be able to give any useful advice until you collect a very large amount of data which would simply filter out some poor goods. The best case is a sort of highly correlated yet very different products (something like a high-end mountain bike and a go-pro cam). Those should be reliably chained based on other user preferences.

    So without further information you wouldn't get much useful insight. What you describe, if the blanks have been filled in somewhat correctly makes sense, but will it work and how much data you'll need will really depend on specifics of the products and users involved.

    I hope it helps.