Search code examples
rubymongodbcouchdbcassandratokyo-cabinet

Picking a database technology


We're setting out to build an online platform (API, Servers, Data, Wahoo!). For context, imagine that we need to build something like twitter, but with the comments (tweets) organized around a live event. Information about the live event itself must be delivered to clients as fast and consistently as possible, while comments about the event can probably wait a bit longer to be delivered. We'll be read-heavy after the live event finishes.

Scalability is very important. We want to start out renting VPS slices, and scale from there. I'm a big fan of the cloud, and would like to remain there as long as possible. We'll probably be using ruby.

I'm convinced that I want to try a document store instead of an RDBMS. I like the idea of schema-less storage and the promises of easier scalability by focusing on key-value.

The problem is I don't know which technology is the most appropriate for our platform. I've looked at Couch, Mongo, Tokyo Cabinet, Cassandra, and an RDBMS with blobbed documents. Any help picking the right tool for this particular job?


Solution

  • Checkout the NO SQL alternatives comparison by BJ Clark.

    Scalability is very important.

    Then you need to consider the excerpts from his blog:

    1. Tokyo Cabinet - Doesn't scale
    2. Redis - Doesn't scale
    3. Project Voldemort - scales
    4. MongoDB - limted (sharding is been implemented)
    5. Cassandra - scales
    6. Amazon S3 - scales
    7. Couch - Doesn't scale (Clustering & replication)
    8. MySQL - Doesn't scale

    And consider HyperTable. This is also a serious contender in No-SQL alternatives. It's an open source implementation of Google's BigTable concept. I believe it scales well because it's extensively used by the Chinese search engine Baidu and entertainment portal Rediff.

    You were saying:

    Information about the live event itself must be delivered to clients as fast and consistently as possible, while comments about the event can probably wait a bit longer to be delivered. We'll be read-heavy after the live event finishes.

    This is something like Twitter's approach. Your programming language selection is also very important, because Twitter initially went with Ruby for back-end message delivery but they were saying it's not a correct choice and they have moved the entire message delivery system to the Scala language.

    They are still using Ruby for their front-end. If you want to go with a highly reliable, fault tolerant system that is well suited for scalable environments, then you should consider Scala or Erlang.