I want to write partitions of 100 MB, using the stress tool in Cassandra 2.1.17. For the sake of simplicity, firstly I'm just trying to write one partition with a single column. My stress yaml looks like this:
keyspace: stresscql
keyspace_definition: |
CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
table: insanitytest
table_definition: |
CREATE TABLE insanitytest (
name text,
value blob,
- name: value
size: FIXED(100000000)
partitions: fixed(1) # number of unique partitions to update in a single operation
# if batchcount > 1, multiple batches will be used but all partitions will
# occur in all batches (unless they finish early); only the row counts will vary
batchtype: LOGGED # type of batch to use
select: fixed(1)/1 # uniform chance any single generated CQL row will be visited in a partition;
# generated for each partition independently, each time we visit it
cql: select * from insanitytest where name = ? LIMIT 100
fields: samerow # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)
I'm running it with:
./tools/bin/cassandra-stress user profile=~/Software/cassandra/tools/cqlstress-insanity-example.yaml n=1 "ops(insert=1,simple1=0)"
Looking at the output I have:
Connected to cluster: Test Cluster
Datatacenter: datacenter1; Host: localhost/; Rack: rack1
Created schema. Sleeping 1s for propagation.
Sleeping 2s...
Running with 4 threadCount
Running [insert, simple1] with 4 threads for 1 iteration
type, total ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb
Generating batches with [1..1] partitions and [1..1] rows (of [1..1] total rows in the partitions)
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency LOCAL_ONE (1 replica were required but only 0 acknowledged the write)
insert, 1, 0, 0, 0, 3985.0, 3985.0, 3985.0, 3985.0, 3985.0, 3985.0, 4.0, -0.00000, 0, 1, 34, 34, 0, 219
simple1, 0, NaN, NaN, NaN, NaN, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.00000, 0, 1, 34, 34, 0, 219
total, 1, 0, 0, 0, 3985.0, 3985.0, 3985.0, 3985.0, 3985.0, 3985.0, 4.0, -0.00000, 0, 1, 34, 34, 0, 219
op rate : 0 [insert:0, simple1:NaN]
partition rate : 0 [insert:0, simple1:NaN]
row rate : 0 [insert:0, simple1:NaN]
latency mean : 3985.0 [insert:3985.0, simple1:NaN]
latency median : 3985.0 [insert:3985.0, simple1:0.0]
latency 95th percentile : 3985.0 [insert:3985.0, simple1:0.0]
latency 99th percentile : 3985.0 [insert:3985.0, simple1:0.0]
latency 99.9th percentile : 3985.0 [insert:3985.0, simple1:0.0]
latency max : 3985.0 [insert:3985.0, simple1:0.0]
Total partitions : 1 [insert:1, simple1:0]
Total errors : 0 [insert:0, simple1:0]
total gc count : 1
total gc mb : 219
total gc time (s) : 0
avg gc time(ms) : 34
stdev gc time(ms) : 0
Total operation time : 00:00:03
However, looking at 'nodetool tpstats' I have one successful mutation (so even though I had the timeout the mutation seems to be successful):
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 1 0 0
ReadStage 0 0 33 0 0
RequestResponseStage 0 0 0 0 0
ReadRepairStage 0 0 0 0 0
CounterMutationStage 0 0 0 0 0
MiscStage 0 0 0 0 0
HintedHandoff 0 0 0 0 0
GossipStage 0 0 0 0 0
CacheCleanupExecutor 0 0 0 0 0
InternalResponseStage 0 0 0 0 0
CommitLogArchiver 0 0 0 0 0
CompactionExecutor 0 0 30 0 0
ValidationExecutor 0 0 0 0 0
MigrationStage 0 0 3 0 0
AntiEntropyStage 0 0 0 0 0
PendingRangeCalculator 0 0 1 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 13 0 0
MemtablePostFlush 0 0 24 0 0
MemtableReclaimMemory 0 0 13 0 0
Native-Transport-Requests 0 0 170 0 0
Message type Dropped
But if I do 'nodetool flush' and 'nodetool status stresscql', this is what I get:
Datacenter: datacenter1
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 131.99 KB 256 100.0% 285b13ec-0b9b-4325-9095-c5f5c0f51079 rack1
Since no transactions were dropped, where did the data go? From my understading I should have a value of ~100MB in the Load column, right?
The issue was not in the stress or the data definition but on the commit_log_segment_size_in_mb. It needs to be at least 50% larger than the data chunks. More info in this answer.