Search code examples
javacassandra

Single data column vs multiple columns in Cassandra


I'm working on a project with an existing cassandra database. The schema looks like this:

partition key (big int) clustering key1 (timestamp) data (text)
1 2021-03-10 11:54:00.000 {a:"somedata", b:2, ...}

My question is: Is there any advantage storing data in a json string? Will it save some space?

Until now I discovered disadvantages only:

  • You cannot (easily) add/drop columns at runtime, since the application could override the json string column.
  • Parsing the json string is currently the bottleneck regarding performance.

Solution

  • No, there is no real advantage to storing JSON as string in Cassandra unless the underlying data in the JSON is really schema-less. It will also not save space but in fact use more because each item has to have a key+value instead of just storing the value.

    If you can, I would recommend mapping the keys to CQL columns so you can store the values natively and accessing the data is more flexible. Cheers!