Search code examples
firebasedatabase-designfirebase-realtime-databaserecommendation-enginenosql

Dealing with lots of data in Firebase for a recommender system


I am building a recommender system where I use Firebase to store and retrieve data about movies and user preferences.

Each movie can have several attributes, and the data looks as follows:

{ 
    "titanic": 
    {"1997": 1, "english": 1, "dicaprio": 1,    "romance": 1, "drama": 1 }, 
    "inception": 
    { "2010": 1, "english": 1, "dicaprio": 1, "adventure": 1, "scifi": 1}
...
}

To make the recommendations, my algorithm requires as input all the data (movies) and is matched against an user profile.

However, in production mode I need to retrieve over >10,000 movies. While the algorithm can handle this relatively fast, it takes a lot of time to load this data from Firebase.

I retrieve the data as follows:

firebase.database().ref(moviesRef).on('value', function(snapshot) {
    // snapshot.val();
}, function(error){
    console.log(error)
});

I am there wondering if you have any thoughts on how to speed things up? Are there any plugins or techniques known to solve this?

I am aware that denormalization could help split the data up, but the problem is really that I need ALL movies and ALL the corresponding attributes.


Solution

  • My suggestion would be to use Cloud Functions to handle this.

    Solution 1 (Ideally)

    If you can calculate suggestions every hour / day / week

    You can use a Cloud Functions Cron to fire up daily / weekly and calculate recommendations per users every week / day. This way you can achieve a result more or less similar to what Spotify does with their weekly playlists / recommendations.

    The main advantage of this is that your users wouldn't have to wait for all 10,000 movies to be downloaded, as this would happen in a cloud function, every Sunday night, compile a list of 25 recommendations, and save into your user's data node, which you can download when the user accesses their profile.

    Your cloud functions code would look like this :

    var movies, allUsers; 
    
    exports.weekly_job = functions.pubsub.topic('weekly-tick').onPublish((event) => {
      getMoviesAndUsers();
    });  
    
    function getMoviesAndUsers () {
      firebase.database().ref(moviesRef).on('value', function(snapshot) {
        movies = snapshot.val();
        firebase.database().ref(allUsersRef).on('value', function(snapshot) {
            allUsers = snapshot.val();
            createRecommendations();
        });
    });
    }
    
    function createRecommendations () {
      // do something magical with movies and allUsers here.
    
      // then write the recommendations to each user's profiles kind of like 
      userRef.update({"userRecommendations" : {"reco1" : "Her", "reco2", "Black Mirror"}});
      // etc. 
    }
    

    Forgive the pseudo-code. I hope this gives an idea though.

    Then on your frontend you would have to get only the userRecommendations for each user. This way you can shift the bandwidth & computing from the users device to a cloud function. And in terms of efficiency, without knowing how you calculate recommendations, I can't make any suggestions.

    Solution 2

    If you can't calculate suggestions every hour / day / week, and you have to do it each time user accesses their recommendations panel

    Then you can trigger a cloud function every time the user visits their recommendations page. A quick cheat solution I use for this is to write a value into the user's profile like : {getRecommendations:true}, once on pageload, and then in cloud functions listen for changes in getRecommendations. As long as you have a structure like this :

    userID > getRecommendations : true

    And if you have proper security rules so that each user can only write to their path, this method would get you the correct userID making the request as well. So you will know which user to calculate recommendations for. A cloud function could most likely pull 10,000 records faster and save the user bandwidth, and finally would write only the recommendations to the users profile. (similar to Solution 1 above) Your setup would like this :

    [Frontend Code]

    //on pageload
    userProfileRef.update({"getRecommendations" : true});
    userRecommendationsRef.on('value', function(snapshot) {  gotUserRecos(snapshot.val());  });
    

    [Cloud Functions (Backend Code)]

    exports.userRequestedRecommendations = functions.database.ref('/users/{uid}/getRecommendations').onWrite(event => {
      const uid = event.params.uid;
      firebase.database().ref(moviesRef).on('value', function(snapshot) {
        movies = snapshot.val();
        firebase.database().ref(userRefFromUID).on('value', function(snapshot) {
            usersMovieTasteInformation = snapshot.val();
            // do something magical with movies and user's preferences here.
            // then 
            return userRecommendationsRef.update({"getRecommendations" : {"reco1" : "Her", "reco2", "Black Mirror"}});
        });
      });
    });
    

    Since your frontend will be listening for changes at userRecommendationsRef, as soon as your cloud function is done, your user will see the results. This might take a few seconds, so consider using a loading indicator.

    P.S 1: I ended up using more pseudo-code than originally intended, and removed error handling etc. hoping that this generally gets the point across. If there's anything unclear, comment and I'll be happy to clarify.

    P.S. 2: I'm using a very similar flow for a mini-internal-service I built for one of my clients, and it's been happily operating for longer than a month now.