Search code examples
cassandracql3cassandra-cli

In Cassandra, why dropping a column from tables defined with compact storage not allowed?


As per datastx documentation here, we cannot delete column from tables defined with COMPACT STORAGE option. What is the reason for this?


Solution

  • This goes back to the original implementation of CQL3, and changes which were made to allow it to abstract a "SQL-like," wide-row structure on top of the original Thrift-based storage engine. Ultimately, managing the schema comes down to whether or not the underlying structure is a table or a column_family.

    As an example, I'll create two tables using an old install of Apache Cassandra (2.1.19):

    CREATE TABLE student (
      studentid TEXT PRIMARY KEY,
      fname TEXT,
      name TEXT);
    
    CREATE TABLE studentcomp (
      studentid TEXT PRIMARY KEY,
      fname TEXT,
      name TEXT)
    WITH COMPACT STORAGE;
    

    I'll insert one row into each table:

    INSERT INTO student (studentid, fname, lname) VALUES ('janderson','Jordy','Anderson');
    INSERT INTO studentcomp (studentid, fname, lname) VALUES ('janderson','Jordy','Anderson');
    

    And then I'll look at the tables with the old cassandra-cli tool:

    [default@stackoverflow] list student;
    Using default limit of 100
    Using default cell limit of 100
    -------------------
    RowKey: janderson
    => (name=, value=, timestamp=1599248215128672)
    => (name=fname, value=4a6f726479, timestamp=1599248215128672)
    => (name=lname, value=416e646572736f6e, timestamp=1599248215128672)
    
    [default@stackoverflow] list studentcomp;
    Using default limit of 100
    Using default cell limit of 100
    -------------------
    RowKey: janderson
    => (name=fname, value=Jordy, timestamp=1599248302715066)
    => (name=lname, value=Anderson, timestamp=1599248302715066)
    

    Do you see the empty/"ghost" column value in the first result? That empty column value was CQL3's link between the column values and the table's meta data. If it's not there, then CQL cannot be used to manage a table's columns.

    The comparator used for type conversion was all that was really exposed via Thrift. This lack of meta data control/exposure is what allowed Cassandra to be considered "schemaless" in the pre-CQL days. If I run a describe studentcomp from within the cassandra-cli, I can see the comparators (validation class) used:

    Column Metadata:
      Column Name: lname
        Validation Class: org.apache.cassandra.db.marshal.UTF8Type
      Column Name: fname
        Validation Class: org.apache.cassandra.db.marshal.UTF8Type
    

    But if I try describe student, I see this:

    WARNING: CQL3 tables are intentionally omitted from 'describe' output.
    See https://issues.apache.org/jira/browse/CASSANDRA-4377 for details.
    
    Sorry, no Keyspace nor (non-CQL3) ColumnFamily was found with name: student (if this is a CQL3 table, you should use cqlsh instead)
    

    Bascially, tables and column families were different entities forced into the same bucket. Adding WITH COMPACT STORAGE essentially made a table a column family. With that came the lack of any schema management (adding or removing columns), outside of access to the comparators.

    Edit 20200905

    Can we somehow / someway (hack) drop the columns from table?

    You might be able to accomplish this. Sylvain Lebresne wrote A Thrift to CQL3 Upgrade Guide which will have some necessary details for you. I also advise reading through the Jira ticket mentioned above (CASSANDRA-4377), as that covers many of the in-depth technical challenges that make this difficult.