I'm running a Cassandra 3.9
cluster, and today I noticed some NULL values in some generated reports.
I opened up cqlsh and after some queries I noticed that null values are appearing all over the data, apparently in random columns.
Replication factor is 3.
I've started a nodetool repair
on the cluster but it hasn't finished yet.
My question is: I searched for this behavior and could not find it anywhere. Apparently the random appearance of NULL values in columns is not a common problem.
Does anyone know what's going on? This kind of data corruption seems pretty serious. Thanks in advance for any ideas.
ADDED Details:
Happens on columns that are frequently updated with toTimestamp(now())
which never returns NULL
, so it's not about null data going in.
Happens on immutable columns that are only inserted once and never changed. (But other columns on the table are frequently updated.)
Do updates cause this like deletions do? Seems kinda serious to me, to wake up to a bunch of NULL
values.
I also know specifically some of the data that has been lost, three entries I've already identified are for important entries which are missing. These have not been deleted for sure - there is no deletion on one specific table which is full of NULL everywhere.
I am the sole admin and nobody ran any nodetool
commands overnight, 100% sure.
UPDATE
nodetool repair
has been running for 6+ hours now and it fully recovered the data on one varchar
column "item description".
It's a Cassandra issue and no, there were no deletions at all. And like I said functions which never return null had null in them(toTimestamp(now())
).
UPDATE 2
So nodetool repair
finished overnight but the NULLs
were still there in the morning.
So I went node by node stopping and restarting them and voilà, the NULLs
are gone and there was no data loss.
This is a major league bug if you ask me. I don't have the resources now to go after it, but if anyone else faces this here's the simple "fix":
nodetool repair -dcpar
to fix all nodes in the datacenter.I faced a similar issue some months ago and it's explained quite well in the following blog (this is not written by me): WAT - Cassandra: Row level consistency #$@&%*!
The null values actually have been caused by updates in this case.