Search code examples
cassandracqlgocqlgocqlx

How do I do a conditional upsert that preserves the value of a column?


I'm looking to mimic an upsert functionality but also add a conditional requirement that if the row doesn't already exist then insert all rows but if the row does exist, only update certain rows. I've spent a long time on this and can't seem to figure out a way to accomplish this with good practice.

For example, let's assume I have this table:

Issues
            id            TEXT,
            type          INT,
            creationDate  TIMESTAMP
PRIMARY KEY (id)
  1. If I try and upsert {id: 123, type: 2, creationDate: 1000} and the primary key doesn't exist, I insert all the columns into a new row.
  2. If I try and upsert again with {id: 123, type: 4, creationDate: 1053}, I want the behaviour to be that creationDate stays as 1000 to show when the first issue occurred but then I want the type to reflect the most recent issue type of '4'.
  3. So now the row should be {id: 123, type: 4, creationDate: 1000}

Is this possible in CQL and what would be the best way to approach this? So far, I've thought of batching an INSERT-IF-NOT-EXISTS and UPDATE and execute those together... or I've thought of running a SELECT and then in golang determining whether to INSERT or UPDATE based on if the row exists or not.

Note: My code is in golang and I'm using the gocqlx library to achieve this. My main goal is to get this to work in an efficient way that follows the best practices for cassandra/cql

Thanks! :)


Solution

  • CQL doesn't support the complex scenario you outlined.

    You can only specify a condition on the partition key, not a non-PK column in the the partition. There is no CQL syntax that prevents a column from getting overwritten if it already exists. Cheers!

    [UPDATE] After thinking about this a bit more, a possible workaround is to define the creation as a CQL set collection. Since the elements in a set are ordered, you know that the first element will always be the earliest creation date.

    To illustrate, here's my example table schema:

    CREATE TABLE community.myset (
        id int PRIMARY KEY,
        creation set<int>,
        type int
    )
    

    I can do an upsert regardless of whether it exists or not with:

    cqlsh> UPDATE myset SET creation = creation + {10} WHERE id = 1;
    
    cqlsh> SELECT * FROM myset ;
    
     id | creation | type
    ----+----------+------
      1 |     {10} | null
    

    And do it repeatedly:

    cqlsh> UPDATE myset SET creation = creation + {100} WHERE id = 1;
    
    cqlsh> SELECT * FROM myset ;
    
     id | creation  | type
    ----+-----------+------
      1 | {10, 100} | null
    

    I can then independently set the type with:

    cqlsh> UPDATE myset SET type=1 WHERE id = 1;
    
    cqlsh> SELECT * FROM myset ;
    
     id | creation  | type
    ----+-----------+------
      1 | {10, 100} |    1
    

    Hopefully this works for your use case. Cheers!