we got a small problem while developing, data structure:
Connected to Test Cluster at kul:9160.
[cqlsh 4.1.1 | Cassandra 2.0.10.71 | DSE 4.5.2 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> use ks1;
cqlsh:ks1> describe KEYSPACE ks1;
CREATE KEYSPACE ks1 WITH replication = {
'class': 'NetworkTopologyStrategy',
'DC1': '1'
};
...
cqlsh:ks1> DESCRIBE table pairs;
CREATE TABLE pairs (
key text,
value text,
PRIMARY KEY ((key))
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.000000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
repro case 1:
cqlsh:ks1> CONSISTENCY
Current consistency level is ONE.
cqlsh:ks1> TRUNCATE pairs;
cqlsh:ks1> BEGIN BATCH INSERT INTO ks1.pairs (key,value) VALUES ('mim','DABCD'); UPDATE ks1.pairs SET value='EABCD' WHERE key='mim'; APPLY BATCH;
cqlsh:ks1> select * from pairs;
key | value
-----+-------
mim | EABCD
(1 rows)
cqlsh:ks1> TRUNCATE pairs;
cqlsh:ks1> BEGIN BATCH INSERT INTO ks1.pairs (key,value) VALUES ('mim','FABCD'); UPDATE ks1.pairs SET value='EABCD' WHERE key='mim'; APPLY BATCH;
cqlsh:ks1> select * from pairs;
key | value
-----+-------
mim | FABCD
(1 rows)
cqlsh:ks1> TRUNCATE pairs;
even changing consistency to all, repro case 2:
cqlsh:ks1> CONSISTENCY all;
Consistency level set to ALL.
cqlsh:ks1> BEGIN BATCH INSERT INTO ks1.pairs (key,value) VALUES ('mim','DABCD'); UPDATE ks1.pairs SET value='EABCD' WHERE key='mim'; APPLY BATCH;
cqlsh:ks1> select * from pairs;
key | value
-----+-------
mim | EABCD
(1 rows)
cqlsh:ks1> TRUNCATE pairs;
cqlsh:ks1> BEGIN BATCH INSERT INTO ks1.pairs (key,value) VALUES ('mim','FABCD'); UPDATE ks1.pairs SET value='EABCD' WHERE key='mim'; APPLY BATCH;
cqlsh:ks1> select * from pairs;
key | value
-----+-------
mim | FABCD
(1 rows)
cqlsh:ks1>
as you can see, second batch in all cases do not update value. seems that there some kind of value comparison.
problem reproduced even on 4 node cluster or a single dse instance. i think this is not a problem of dse.
we hit a bug? or feature?
You are seeing the result of several properties of the Cassandra Database.
There is no difference between inserts and updates in a Cassandra database.
Cassandra follows a Last Write Wins (LWW) policy, that is the last change (by timestamp) to a given cell is the value of that cell. On tie's the higher value is selected.
Batches in Cassandra are by default applied with the same timestamp for every mutation in the batch (Although you can change this).
These properties together mean that if you batch several mutations to the same cell, the order of your statements will not actually effect the final state of that cell.
Lets take a look at one of your examples
BEGIN BATCH
INSERT INTO ks1.pairs (key,value) VALUES ('mim','FABCD');
UPDATE ks1.pairs SET value='EABCD' WHERE key='mim';
APPLY BATCH;
C* will take this batch and interpret it as
SET pairs key = ['mim'] value = ['FABCD'] at time X
SET pairs key = ['mim'] value = ['EABCD'] at time X
C* will apply both of these mutations but since they are tied for timestamp, the higher value wins and the final value of the cell will be
['FABCD']
If you really wanted to enforce an ordering in your batch (and I would consider remodeling your application so that you don't need to do this.) You can specify timestamps for each mutation so that you know what order they will be applied in.
I hope this explains the behavior you are seeing.