Search code examples
cassandradatabase-migration

In Cassandra does creating a tables with multiple columns take more space compared to multiple tables?


I have 6 tables in my database each consisting of approximate 12-15 columns and they have relationship with its id to main_table. I have to migrate my database to cassandra so I have a question should I create one main_table with consisting multiple columns or different table as in my mysql database.

Will creatimg multiple column take more space or multiple table will take more space


Solution

  • Your line of questioning is flawed. It is a common mistake for DBAs who only have a background in traditional relational databases to view data as normalised tables.

    When you switch to NoSQL, you are doing it because you are trying to solve a problem that traditional RDBMS can't. A paradigm shift is required since you can't just migrate relational tables the way they are, otherwise you're back to where you started.

    The principal philosophy of data modelling in Cassandra is that you need to design a CQL table for each application query. It is a one-to-one mapping between app queries and CQL tables. The crucial point is you need to start with the app queries, not the tables.

    Let us say that you have an application that stores information about users which include usernames, email addresses, first/last name, phone numbers, etc. If you have an app query like "get the email address for username X", it means that you need a table of email addresses and the schema would look something like:

    CREATE TABLE emails_by_username (
        username text,
        email text,
        firstname text,
        lastname text,
        ...
        PRIMARY KEY(username)
    )
    

    You would then query this table with:

    SELECT email FROM emails_by_username WHERE username = ?
    

    Another example is where you have an app query like "get the first and last names for a user where email address is Y". You need a table of users partitioned by email instead:

    CREATE TABLE users_by_email (
        email text,
        firstname text,
        lastname text,
        ...
        PRIMARY KEY(email)
    )
    

    You would query the table with:

    SELECT firstname, lastname FROM users_by_email WHERE email = ?
    

    Hopefully with these examples you can see that the disk space consumption is completely irrelevant. What is important is that you design your tables so they are optimised for the application queries. Cheers!