Search code examples
datedatetimecassandra

Cassandra Alter Column type from Timestamp to Date


Is there any way to alter the Cassandra column from timestamp to date without data lost? For example '2021-02-25 20:30:00+0000' to '2021-02-25'

If not, what is the easiest way to migrate this column(timestamp) to the new column(date)?


Solution

  • It's impossible to change a type of the existing column, so you need to add a new column with correct data type, and perform migration. Migration could be done via Spark + Spark Cassandra Connector - it could be most flexible solution, and even could be done via single node machine with Spark running in the local master mode (default). Code could look something like this (try on test data first):

    import pyspark.sql.functions as F
    options = { "table": "tbl", "keyspace": "ks"}
    spark.read.format("org.apache.spark.sql.cassandra").options(**options).load()\
      .select("pk_col1", "pk_col2", F.col("timestamp_col").cast("date").alias("new_name"))\
      .write.format("org.apache.spark.sql.cassandra").options(**options).save()
    

    P.S. you can use DSBulk, for example, but you need to have enough space to offload the data (although you need only primary key column + your timestamp)