I have table like this in Cassandra(2.1.15.1423) with over the 14 000 000 records:
CREATE TABLE keyspace.table (
field1 text,
field2 text,
field3 text,
field4 uuid,
field5 map<text, text>,
field6 list<text>,
field7 text,
field8 list<text>,
field9 list<text>,
field10 text,
field11 list<text>,
field12 text,
field13 text,
field14 text,
field15 list<frozen<user_defined_type>>,
field16 text,
field17 text,
field18 text,
field19 text,
PRIMARY KEY ((field1, field2, field3) field4)
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
In application I use Python (cassandra-driver==3.1.1) and Go (gocql).
Problem:
I need to move records from this table to another. When I try to get data (even without filters) all stops and I get timeout error. I tried to change fetch_size/page_size - result the same but after few minutes of waiting.
If you are going to move records from this table to a different table you should do this one partition range at a time. Doing something similar to a
SELECT * FROM keyspace.table
will not work in a highly distributed datastore such as Cassandra. This is becasue a query like the one above requires a full cluster scan and scatter/gather operation to be performed in order to satisfy it. This is an anti-pattern in C* and will cause timeouts in most cases. A better approach is to only query one partition at a time. This data can be retrieved very quickly by the data store. A common pattern for this sort of operation is to iterate through the token ranges of the table one at a time and process each one individually.Here is an example (sorry it is in Java) of how you can slice the token ranges in Cassandra to only deal with a small subset of the data at a time: