cassandra insert+update same key in single batch

we got a small problem while developing, data structure:

Connected to Test Cluster at kul:9160.
[cqlsh 4.1.1 | Cassandra 2.0.10.71 | DSE 4.5.2 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> use ks1;
cqlsh:ks1> describe KEYSPACE ks1;

CREATE KEYSPACE ks1 WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC1': '1'
};
...

cqlsh:ks1> DESCRIBE table pairs;

CREATE TABLE pairs (
  key text,
  value text,
  PRIMARY KEY ((key))
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

repro case 1:

cqlsh:ks1> CONSISTENCY
Current consistency level is ONE.
cqlsh:ks1> TRUNCATE pairs;
cqlsh:ks1> BEGIN BATCH INSERT INTO ks1.pairs (key,value) VALUES ('mim','DABCD'); UPDATE ks1.pairs SET value='EABCD' WHERE key='mim'; APPLY BATCH;
cqlsh:ks1> select * from pairs;

 key | value
-----+-------
 mim | EABCD

(1 rows)

cqlsh:ks1> TRUNCATE pairs;
cqlsh:ks1> BEGIN BATCH INSERT INTO ks1.pairs (key,value) VALUES ('mim','FABCD'); UPDATE ks1.pairs SET value='EABCD' WHERE key='mim'; APPLY BATCH;
cqlsh:ks1> select * from pairs;

 key | value
-----+-------
 mim | FABCD

(1 rows)

cqlsh:ks1> TRUNCATE pairs;

even changing consistency to all, repro case 2:

cqlsh:ks1> CONSISTENCY all;
Consistency level set to ALL.
cqlsh:ks1> BEGIN BATCH INSERT INTO ks1.pairs (key,value) VALUES ('mim','DABCD'); UPDATE ks1.pairs SET value='EABCD' WHERE key='mim'; APPLY BATCH;
cqlsh:ks1> select * from pairs;

 key | value
-----+-------
 mim | EABCD

(1 rows)

cqlsh:ks1> TRUNCATE pairs;
cqlsh:ks1> BEGIN BATCH INSERT INTO ks1.pairs (key,value) VALUES ('mim','FABCD'); UPDATE ks1.pairs SET value='EABCD' WHERE key='mim'; APPLY BATCH;
cqlsh:ks1> select * from pairs;

 key | value
-----+-------
 mim | FABCD

(1 rows)

cqlsh:ks1>

as you can see, second batch in all cases do not update value. seems that there some kind of value comparison.

problem reproduced even on 4 node cluster or a single dse instance. i think this is not a problem of dse.

we hit a bug? or feature?

Solution

You are seeing the result of several properties of the Cassandra Database.

There is no difference between inserts and updates in a Cassandra database.
Cassandra follows a Last Write Wins (LWW) policy, that is the last change (by timestamp) to a given cell is the value of that cell. On tie's the higher value is selected.
Batches in Cassandra are by default applied with the same timestamp for every mutation in the batch (Although you can change this).

These properties together mean that if you batch several mutations to the same cell, the order of your statements will not actually effect the final state of that cell.

Lets take a look at one of your examples

BEGIN BATCH 
INSERT INTO ks1.pairs (key,value) VALUES ('mim','FABCD'); 
UPDATE ks1.pairs SET value='EABCD' WHERE key='mim'; 
APPLY BATCH;

C* will take this batch and interpret it as

SET pairs key = ['mim'] value = ['FABCD'] at time X
SET pairs key = ['mim'] value = ['EABCD'] at time X

C* will apply both of these mutations but since they are tied for timestamp, the higher value wins and the final value of the cell will be

['FABCD']

If you really wanted to enforce an ordering in your batch (and I would consider remodeling your application so that you don't need to do this.) You can specify timestamps for each mutation so that you know what order they will be applied in.

I hope this explains the behavior you are seeing.