Search code examples
algorithmtwitterpseudocode

How does the Twitter timeline algorithm work?


I'm trying to design a system similar to Twitter's timeline, but I can't wrap my head around how to get updates from so many followers while remaining efficient. Let's say I'm following 1000 people on Twitter. When I go to my feed, how does it know which tweets to show me? This is what I'm thinking, but it seems extremely inefficient and unlikely:

You have 10,000 friends.
In a for loop, loop through each friend, getting their latest 
  status updates since their last update. 

But that just seems ridiculous to loop through 10,000 friends. I can't imagine how else they'd do it though. Or would it be something like:

Someone I am following posted a tweet. That tweet is inserted in 
  an array containing the tweets of all people I am following.

But then that would seem weird, if I followed someone new who has 20,000 tweets, then 20,000 tweets would be inserted in my array, and if that person has millions of followers, then there are a million X 20,000 copies of the same set of tweets. So that also seems unlikely.

Anyone have any ideas how they could possibly do it?


Solution

  • I advice you to check the twissandra project they have implemented all the basic functionality of twitter using cassandra , a nosql database. It is said twitter is no longer using it for tweets .

    The old implementation can be consulted here