algorithm optimization matching recommendation-engine bipartite

Recommendation engine optimization for newsfeed

I am looking to write a recommendation engine to optimize a newsfeed that I want to implement in my app. It would be based on preferences that users pick up during the sign-up phase.

The logic is the following: a user signs up, and chooses one or several topics of interest amongst 15 of them. In the app, users are able to post content such as photos, text etc..

I wanted to match people using the app with content coming from users that filled the same preferences during the sign up phase (or with a high index of correlation called C).

In order to do so, I thought about implementing a “relevancy” score that would be attached to each post.

That score would be calculated as follow: Score= C (index of correlation between the two users’ preferences) x P (popularity of the user who posted the content) x F (freshness of the post in order to display content that has been posted recently). The news feed would then display the posts with the highest to lowest scores in each user’s feed.

The difficulty here would be to generate a score for each post that would differ for every news feed and to translate that in our database in order to make the right number of requests. I am using Expo (React Native) and Firestore as a database.

Here is a real case example: During the sign up phase let’s say I have the choice between 5 topics of interest: Sport, Photography, Music, Fashion and Travels. I chose Sport and Travels. After completing that phase and ending up on the app’s news feed, I want to be matched with content that is primarily related to Sports and Travels (let’s not even consider weighting the topics here). Therefore, I want to display content from other users that chose the exact same categories (the correlation index would be 1) or the closest (the next best correlation index here would be 0,5).

I would then get content from people that chose Sports and Travels, then content from people that chose Sports or Travels, then content from people that chose Sports and Travels amongst many others (each time reducing our C index).

How exactly can I translate this into an algorithmic class as I went through a lot of documentation about Assignment problem algorithms, weighted bipartite graphs and combinatorial optimization issues overall but I’m still stuck...

Thank you for your time, I really appreciate it.

Solution

Let's say we have 2 set A and B with interests:

One way to define correlation can be:

Correlation =  size(intersection(A, B)) / max( size(A), size(B) )

Scenario 1:

Exact match : A: { Sport, Travel } B : {Sports, Travel }

Correlation :=  size(Sport, travel) / 2 = 2/2 = 1

Scenario 2:

Exact match : A: { Sport, Travel } B : {Sports, Travel, Car, Dress, Movie }

Correlation :=  size(Sport, travel) / 5 = 2/5 = 0.4

Scenario 3:

Exact match : A: { Sport, Travel } B : {Sports}

Correlation :=  size(Sport, travel) / 2 = 1/2 = 0.5