I was reading the post given in link: https://www.educative.io/courses/grokking-the-system-design-interview/m2yDVZnQ8lG. I am having alot of difficulty in understanding the section Database Schema:
Now, this section recomends to store metadata for photos in a nosql store like cassandra.
The questions are:
For the ‘UserPhoto’ table, the ‘key’ would be ‘UserID’, and the ‘value’ would be the list of ‘PhotoIDs’ the user owns, stored in different columns.
... what exactly does it mean by that storing the list in different columns?
I assume the table schema would look something like:
CREATE TABLE user_photos
userid text,
photoid int,
photopath varchar,
...
PRIMARY KEY (userid, photoid)
)
The PRIMARY KEY
for the table has the partition key as userid
and photoid
as a clustering column. It means that each record (identified by userid
) in the table will have multiple "rows" of photoid
(clustering column) since each user can have multiple photos.
Cassandra is referred to as a wide-column store because data is stored in "wide columns" meaning columns are repeated one or more times as required. To illustrate using the example above, below is a representation of how a record is stored on disk:
+----------+-----------+-----------+-----+-----------+
| PK | Column 1 | Column 2 | ... | Column n |
+----------+-----------+-----------+-----+-----------+
| userid = | photoid = | photoid = | ... | photoid = |
| 'abc123' | 56789012 | 78901234 | ... | 90123456 |
+----------+-----------+-----------+-----+-----------+
Each record can have one column, or a hundred columns. It depends on how many photos a user has. It isn't a fixed number of columns like traditional RDBMS tables.
While it recommends, using a nosql store, how exactly will this be useful over a rdbms?
A lot of use cases for NoSQL databases can't be modelled in the traditional two-dimensional RDBMS tables (columns running along the top, rows running down the page).
Like the above example shows, Cassandra supports both the traditional 2D tables but also multi-dimensional tables.
But more importantly, RDBMS cannot achieve scale in the same way that databases like Cassandra does. You can have hundreds or a thousand nodes in a Cassandra cluster and you can have nodes distributed across the globe. There are lots of features and attributes in NoSQL DBs and Cassandra that cannot be achieved with RDBMS. Cheers!