Timeline Reconstruction When a User Is Followed

This question is very similar to this one, however there are no answers on that one. I posted this one with more clarity in hopes of receiving an answer.

According to this presentation, Twitter incorporates a fanout method to push Tweets to each individual user's timeline in Redis. Obviously, this fanout only takes place when a user you're following Tweets something.

Suppose a new user, who has never followed anyone before (and conversely has no Tweets in their timeline), decides to follow someone. Using just the above method, they would have to wait until the user they followed Tweeted something for anything to show up on their timeline. After some observation, this is not the case. Twitter pulls in the latest Tweets from the user.

Now suppose that a new user follows 5 users, how does Twitter organize and push those Tweets into the user's timeline in Redis?

Suppose a user already follows 5 users and they have a fair amount of Tweets from these users in their timeline. When they follow another 5 users, how are these user's individual Tweets pushed into the initial user's timeline in Redis in the correct order? More importantly, how is it able to calculate how many to bring in from each user (seeing that they cap timelines at 800 Tweets).

Solution

Here is a way of how I would try to implement it this if I understand well your question.

Store each tweet in a hash. The key of the hash could be something like: tweet:<tweetID>. Store the IDs of the tweets of a given user in a sorted set named user:<userID>:tweets. You set the score of the tweet as a unix timestamp, so they appear in the correct order. You can then get a list of the 800 most recent tweet IDs for the user with the instruction ZREVRANGEBYSCORE

  ZREVRANGEBYSCORE user:<userID>:tweets +inf -inf LIMIT 0 800

When a user follows a new person, you copy the list of ids returned by this instruction in the timeline of the follower (either in the application code, or using a LUA script). This timeline is once again represented by a sorted set, with unix timestamps as scores. If you do the copy in the application code, which is perfectly acceptable with Redis, don't forget to use pipelining to perform your multiples writes in the sorted set in a unique network operation. It will greatly improve the performances.

To get the timeline content, use pipelining too. Request the tweets ID, using ZREVRANGEBYSCORE with a limit option and/or a timestamp as lower limit if you don't want tweets posted before a certain date.