Search code examples
cassandranosqldata-modelingcolumn-orientedwide-column-store

Why many refer to Cassandra as a Column oriented database?


Reading several papers and documents on internet, I found many contradictory information about the Cassandra data model. There are many which identify it as a column oriented database, other as a row-oriented and then who define it as a hybrid way of both.

According to what I know about how Cassandra stores file, it uses the *-Index.db file to access at the right position of the *-Data.db file where it is stored the bloom filter, column index and then the columns of the required row.

In my opinion, this is strictly row-oriented. Is there something I'm missing?


Solution

  • Cassandra is a partitioned row store. Rows are organized into tables with a required primary key.

    Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster.

    Row store means that like relational databases, Cassandra organizes data by rows and columns.

    • Column oriented or columnar databases are stored on disk column wise.

      e.g: Table Bonuses table

        ID         Last    First   Bonus
        1          Doe     John    8000
        2          Smith   Jane    4000
        3          Beck    Sam     1000
      
    • In a row-oriented database management system, the data would be stored like this: 1,Doe,John,8000;2,Smith,Jane,4000;3,Beck,Sam,1000;

    • In a column-oriented database management system, the data would be stored like this:
      1,2,3;Doe,Smith,Beck;John,Jane,Sam;8000,4000,1000;

    • Cassandra is basically a column-family store

    • Cassandra would store the above data as,

         "Bonuses" : {
               row1 : { "ID":1, "Last":"Doe", "First":"John", "Bonus":8000},
               row2 : { "ID":2, "Last":"Smith", "First":"Jane", "Bonus":4000}
               ...
         }
    
    • Also, the number of columns in each row doesn't have to be the same. One row can have 100 columns and the next row can have only 1 column.

    • Read this for more details.