We are changing partition key of one of our table by removing one column from partition key. Every record in this table also has TTL. Now we want to preserve the data in that table with TTL. How can we do it?
We can create new table with desired schema and then copy data from old table to new table. However, we loose TTL in this process.
For further information - This Cassandra table is populated by an Apache Storm application which reads events from Kafka. We can re-hydrate Kafka messages but Kafka has some unwanted messages which we don't want to process.
NOTE - TTL is decided based on date column value, which never changes. Because of this TTL would always be same on all the columns.
Before going to specific implementation, it's important to understand that TTL may exist on the individual cell as well as all cells in the row. And when you're performing INSERT or UPDATE operation, you can apply only one TTL value for all columns that are specified in the query, so if you have 2 columns with different TTLs, then you'll need to perform 2 queries - for each column, with different TTLs.
Regarding the tooling - there are 2 more or less ready-to-use options here:
dsbulk unload -h localhost -query \
"SELECT id, petal_length, WRITETIME(petal_length) AS w_petal_length, TTL(petal_length) AS l_petal_length, .... FROM dsbulkblog.iris_with_id" \
-url /tmp/dsbulkblog/migrate
dsbulk load -h localhost -query \
"BEGIN BATCH INSERT INTO dsbulkblog.iris_with_id(id, petal_length) VALUES (:id, :petal_length) USING TIMESTAMP :w_petal_length AND TTL :l_petal_length; ... APPLY BATCH;" \
-url /tmp/dsbulkblog/migrate --batch.mode DISABLED