Search code examples
jsonuser-defined-typescassandra-2.1

Serialized Json vs UDT implications in data and schema migrations in cassandra


Coming up short with a real answer to what the different implications of storing the serialized json of a type vs using a UDT in Cassandra are. I'm now reaching out hoping for someone with experience to elaborate.

In terms of performance, data and schema changes (add, alter, remove columns) how do they differ?
What are some pro's and cons of each approach?
In what other noteworthy way do they differ?


Solution

  • There is a big difference and I'll try to explain it.

    UDTs are awesome if you wan't "strongly typed" fields in CQL schema. You can use UDT as a part of your primary key (clustering column) as well as adding and renaming fields. Downside is that when doing selects you are always selecting the whole UDT and you cannot remove a field. Don't go too crazy with usage because they are a hell to maintain especially if same ones are used across multiple tables.

    Using a serialized JSON string is good for some cases. I even heard people save compressed data into fields (protobuff) to solve their problems (I think that someone from Soundcloud was talking about this). The problem with JSON is that they are not typed and that you need additional logic on the application to handle the serialization and changes to the data. This also means that you can have variable structure and insert only the fields that you need.

    At the end its about your preference as long as you understand pros and cons of both approaches.