Scaling Firefeed Followers

I am in the research phase of a realtime application that I want to write and I think Firebase is the right choice but I'm currently stuck trying to figure out my data schema. My application is similar to the Firefeed example application in that it is a social inbox. My issue is the following code where the data is looped and copied to every user that "follows" the current user. Theoretically if this was Twitter and someone like Kim Kardashian posted a new Spark it'd have to loop through and save 50,000,000+ records.

Doing this on the client side, or doing this at all, seems extremely slow and error prone to me. Is this a valid concern? I realize my app has zero users right now but I'd like to plan my scaling ahead of time.

// Add spark ID to the feed of everyone following this user.
currentUser.child("followers").once("value", function(list) {
    list.forEach(function(follower) {
        var childRef = firebase.child("users").child(follower.name());
        childRef.child("feed").child(sparkRefId).set(true);
    });
});

I'd really appreciate any help and insight here!

Thanks.

Solution

tl;dr: I'd wait until you get further before coding up solutions. Avoid premature optimization.

Distant future scaling issues are very hard to optimize for because it's very difficult to predict how people will end up using your software.

But, to answer your specific question, there are ways to handle the Kim Kardashians of the social media world. It all comes down to partitioning behavior. You're going to have to treat them differently than the rest of your users. You'll have to do this regardless of the tech stack you use.

The extent to which you partition behavior depends a lot on the distribution of your users. Remember Tom from MySpace? That's an extreme example. I bet there were references to isTom peppered all over the codebase to deal with it, but we probably don't need to go that far.

In the case of the snippet of code in your question, it's already got a lot going for it in terms of scale. It distributes the data across all of the followers, and in doing so does not create any hot spots in the data. It will, however, take some time to run for 50,000,000 users.

My first attempt at optimization would be to take the same code and put it on a node worker. I'd then switch the client to register a task for that node worker for my really popular users.

If that still wasn't fast enough, I'd start looking into ways to partition the data for my super-users.