So this is driving me crazy. i tried querying one of my table in Cassandra and it showed query failure. i tried digging dip in to the reason behind it and found that it was because of tombstone. i changed GC_GRACE_SECONDS to Zero and triggered Compaction using nodetool, And when i queried again it worked fine. however on a subsequent calls query failed again with a same reason. i am using cassandra-nodejs driver. This is my data model.
CREATE TABLE my_table (
firstname text,
lastname text,
email text,
mobile text,
date timeuuid,
value float,
PRIMARY KEY (firstname, lastname, email, mobile)
) WITH CLUSTERING ORDER BY (lastname ASC, email ASC, mobile ASC);
this is the query i want to perform on that data model.
SELECT firstname, email, toDate(date) as date, mobile, value FROM my_table WHERE date >= minTimeuuid('2017-03-25 00:00:00+0000') AND date <= minTimeuuid('2017-03-28 23:59:59+0000') ALLOW FILTERING;
the result will have approx 40k rows. this shows that if we delete something it will be marked as tombstone and will get deleted After GC_GRACE_SECONDS setted for given table. If i understand it correctly then.
I am using cassandra 3.5, with cqlsh version 5.0.1. And the query works fine with terminal, but gives error when we execute using external client (express app using nodejs driver for cassandra). i have a single node cluster app.
This is the log of my Inserted null value in field (i inserted only name and timestamp);
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------
Execute CQL3 query | 2017-03-29 10:28:27.342000 | 172.31.34.179 | 0
Parsing select * FROM testtomb WHERE name = 'Dhaval45'; [SharedPool-Worker-2] | 2017-03-29 10:28:27.342000 | 172.31.34.179 | 64
Preparing statement [SharedPool-Worker-2] | 2017-03-29 10:28:27.342000 | 172.31.34.179 | 101
Executing single-partition query on testtomb [SharedPool-Worker-3] | 2017-03-29 10:28:27.342000 | 172.31.34.179 | 210
Acquiring sstable references [SharedPool-Worker-3] | 2017-03-29 10:28:27.342000 | 172.31.34.179 | 223
Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2017-03-29 10:28:27.342000 | 172.31.34.179 | 243
Merged data from memtables and 0 sstables [SharedPool-Worker-3] | 2017-03-29 10:28:27.342000 | 172.31.34.179 | 288
Read 2 live and 0 tombstone cells [SharedPool-Worker-3] | 2017-03-29 10:28:27.342001 | 172.31.34.179 | 310
Merged data from memtables and 0 sstables [SharedPool-Worker-3] | 2017-03-29 10:28:27.342001 | 172.31.34.179 | 323
Request complete | 2017-03-29 10:28:27.342385 | 172.31.34.179 | 385
And this is the log when i query on filed which i have executed a delete query. Initially user Dhaval15 has firstname 'aaaa' and then i the cell aaa. then again executing select query on same user gave me this log.
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------
Execute CQL3 query | 2017-03-29 10:35:18.581000 | 172.31.34.179 | 0
Parsing select * FROM testtomb WHERE name = 'Dhaval15'; [SharedPool-Worker-1] | 2017-03-29 10:35:18.581000 | 172.31.34.179 | 65
Preparing statement [SharedPool-Worker-1] | 2017-03-29 10:35:18.581000 | 172.31.34.179 | 113
Executing single-partition query on testtomb [SharedPool-Worker-3] | 2017-03-29 10:35:18.581000 | 172.31.34.179 | 223
Acquiring sstable references [SharedPool-Worker-3] | 2017-03-29 10:35:18.581000 | 172.31.34.179 | 235
Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2017-03-29 10:35:18.581000 | 172.31.34.179 | 256
Merged data from memtables and 0 sstables [SharedPool-Worker-3] | 2017-03-29 10:35:18.581001 | 172.31.34.179 | 305
Read 1 live and 1 tombstone cells [SharedPool-Worker-3] | 2017-03-29 10:35:18.581001 | 172.31.34.179 | 338
Merged data from memtables and 0 sstables [SharedPool-Worker-3] | 2017-03-29 10:35:18.581001 | 172.31.34.179 | 351
Request complete | 2017-03-29 10:35:18.581430 | 172.31.34.179 | 430
In Cassandra tombstone created even if you don't execute delete query, when you insert null value.
Tombstone consume space. When you execute select query cassandra needs to filter out data by tombstone. If huge tombstone generated your select query performance will degrade.
Your query failing because of huge tombstone and ALLOW FILTERING
. Don't use ALLOW FILTERING
on production. it's very costy. When you execute query without specifying partition key, Cassandra needs to scan all the row of all the nodes.
Change your data model to like the below one :
CREATE TABLE my_table (
year int,
month int,
date timeuuid,
email text,
firstname text,
lastname text,
mobile text,
value float,
PRIMARY KEY ((year, month), date)
);
Here you can specify year and month extract from the date.
Now you can query with specifying partition key :
SELECT * FROM my_table WHERE year = 2017 AND month = 03 AND date >= minTimeuuid('2017-03-25 00:00:00+0000') AND date <= minTimeuuid('2017-03-28 23:59:59+0000') ;
This will return result very efficiently and will not fail.
If you need to query with firstname and lastname create an index on them
CREATE INDEX index_firstname ON my_table (firstname) ;
CREATE INDEX index_lastname ON my_table (lastname) ;
Then you can query with firstname or last name
SELECT * FROM my_table WHERE firstname = 'ashraful' ;
SELECT * FROM my_table WHERE lastname = 'islam' ;
Here i have not create index on email and phone because of high cardinality problem. Instead create materialized view or another table to query with phone or email
CREATE MATERIALIZED VIEW mview_mobile AS
SELECT *
FROM my_table
WHERE mobile IS NOT NULL AND year IS NOT NULL AND month IS NOT NULL AND date IS NOT NULL
PRIMARY KEY (mobile, year, month, date);
CREATE MATERIALIZED VIEW mview_email AS
SELECT *
FROM my_table
WHERE email IS NOT NULL AND year IS NOT NULL AND month IS NOT NULL AND date IS NOT NULL
PRIMARY KEY (email, year, month, date);
Now you can query with phone or email
SELECT * FROM mview_mobile WHERE mobile = '018..';
SELECT * FROM mview_email WHERE email = 'ashraful@...';
More about cassandra tombstone : http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html