Search code examples
cassandranosqlcqlspring-data-cassandra

Cassandra data modeling when columns are dynamic


I am struggling with data modeling in cassandra where i have different attributes for different organizations. As there would be any number of attributes i am unable to model a dynamic number of columns in schema. Secondly, when i use map for this, i am unable to query against those attributes or index them etc. Am i missing something or this is a limitation in cassandra?


Scenario

one organization selects specific attributes to collect data for and they can change those attributes anytime. When they change, number of attributes and name of attributes changes. If previously we were collecting data for att1,attr2,attr3, now we are collecting attr4,attr5,attr6,attr7,attr8,attr9. And this can be changed at anytime for any organization. Furthermore, organization will be searching massively on those attributes.

  1. How can we model such scenario in cassandra.
  2. if it's a limitation, what could be the alternatives of cassandra where we have read/write (mostly write and often read. Not update/delete) proficiency.
  3. Do we have to combine any other framework with cassandra? like lucene etc

Thanks in advance.


Solution

  • This case really requires more information about queries that are executed, etc.

    in simplest case, just put the attribute name as a clustering column in addition to existing, like this:

    create table tbl (
      id int,
      collected timestamp,
      attr_name text,
      attr_value int,
      primary key(id, collected, attr_name);
    

    in this case you can select either individual attribute if you do

    select * from tbl where id = ... and collected = ... and attr_name = 'attrX';
    

    or you can select all attributes by just omitting the attr_name:

    select * from tbl where id = ... and collected = ...;
    

    but it will work only when all attribute values have the same data type. If they could be different, then you may need to add more fields for every data type.