Search code examples
database-designcassandrainstagram

Finding the best db design for design Instagram problem


I was reading the post given in link: https://www.educative.io/courses/grokking-the-system-design-interview/m2yDVZnQ8lG. I am having alot of difficulty in understanding the section Database Schema:

https://www.educative.io/courses/grokking-the-system-design-interview/m2yDVZnQ8lG#div-stylecolorblack-background-colore2f4c7-border-radius5px-padding5px6-database-schema

Now, this section recomends to store metadata for photos in a nosql store like cassandra.

The questions are:

  • With reference to the line as below, what exactly does it mean by that storing the list in different columns?

For the ‘UserPhoto’ table, the ‘key’ would be ‘UserID’, and the ‘value’ would be the list of ‘PhotoIDs’ the user owns, stored in different columns.

  • While it recommends, using a nosql store, how exactly will this be useful over a rdbms?

Solution

  • ... what exactly does it mean by that storing the list in different columns?

    I assume the table schema would look something like:

    CREATE TABLE user_photos
      userid text,
      photoid int,
      photopath varchar,
      ...
      PRIMARY KEY (userid, photoid)
    )
    

    The PRIMARY KEY for the table has the partition key as userid and photoid as a clustering column. It means that each record (identified by userid) in the table will have multiple "rows" of photoid (clustering column) since each user can have multiple photos.

    Cassandra is referred to as a wide-column store because data is stored in "wide columns" meaning columns are repeated one or more times as required. To illustrate using the example above, below is a representation of how a record is stored on disk:

    +----------+-----------+-----------+-----+-----------+
    | PK       | Column 1  | Column 2  | ... | Column n  |
    +----------+-----------+-----------+-----+-----------+
    | userid = | photoid = | photoid = | ... | photoid = |
    | 'abc123' | 56789012  | 78901234  | ... | 90123456  |
    +----------+-----------+-----------+-----+-----------+
    

    Each record can have one column, or a hundred columns. It depends on how many photos a user has. It isn't a fixed number of columns like traditional RDBMS tables.

    While it recommends, using a nosql store, how exactly will this be useful over a rdbms?

    A lot of use cases for NoSQL databases can't be modelled in the traditional two-dimensional RDBMS tables (columns running along the top, rows running down the page).

    Like the above example shows, Cassandra supports both the traditional 2D tables but also multi-dimensional tables.

    But more importantly, RDBMS cannot achieve scale in the same way that databases like Cassandra does. You can have hundreds or a thousand nodes in a Cassandra cluster and you can have nodes distributed across the globe. There are lots of features and attributes in NoSQL DBs and Cassandra that cannot be achieved with RDBMS. Cheers!