Upfront warning - cassandra beginner
I have setup a 4 node m3.xlarge cluster on aws using the datastax enterprise ami and loaded data using the Cassandra bulkloader approach.
Cassandra version is "ReleaseVersion: 2.1.9.791"
One of the four nodes - the one I started the buklkload from - seems to be stuck in compaction (for last 12 hours nothing changed)
$ nodetool compactionstats
pending tasks: 1
compaction type keyspace table completed total unit progress
Compaction xxx yyy 60381305196 66396499686 bytes 90.94%
Active compaction remaining time : 0h05m58s
I have also noticed that sometimes that node becomes unavailable (goes red in opscenter) but after a while (a long while) it becomes available again.
In the cassandra log is an exception (see below). What is weird though is that there is lot's of disk space left.
> ERROR [MemtableFlushWriter:3] 2015-10-29 23:54:21,511
> CassandraDaemon.java:223 - Exception in thread
> Thread[MemtableFlushWriter:3,5,main]
> org.apache.cassandra.io.FSWriteError: java.io.IOException: No space
> left on device
> at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:663)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
> at org.apache.cassandra.io.sstable.SSTableWriter.close(SSTableWriter.java:500)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
> at org.apache.cassandra.io.sstable.SSTableWriter.finish(SSTableWriter.java:453)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
> at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:445)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
> at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:440)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
> at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:389)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
> at org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:335)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
> at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
> ~[guava-16.0.1.jar:na]
> at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1154)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> ~[na:1.7.0_80]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> ~[na:1.7.0_80]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80] Caused by: java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method) ~[na:1.7.0_80]
> at java.io.FileOutputStream.write(FileOutputStream.java:345) ~[na:1.7.0_80]
> at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> ~[na:1.7.0_80]
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> ~[na:1.7.0_80]
> at org.apache.cassandra.io.util.DataOutputStreamPlus.flush(DataOutputStreamPlus.java:55)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
> at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:657)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
> ... 12 common frames omitted
Tpstats output is
$ nodetool tpstats
Pool Name Active Pending Completed Blocked All time blocked
CounterMutationStage 0 0 0 0 0
ReadStage 0 0 19485 0 0
RequestResponseStage 0 0 116191 0 0
MutationStage 0 0 386132 0 0
ReadRepairStage 0 0 848 0 0
GossipStage 0 0 46669 0 0
CacheCleanupExecutor 0 0 0 0 0
AntiEntropyStage 0 0 0 0 0
MigrationStage 0 0 1 0 0
Sampler 0 0 0 0 0
ValidationExecutor 0 0 0 0 0
CommitLogArchiver 0 0 0 0 0
MiscStage 0 0 0 0 0
MemtableFlushWriter 0 0 80 0 0
MemtableReclaimMemory 0 0 79 0 0
PendingRangeCalculator 0 0 4 0 0
MemtablePostFlush 1 33 127 0 0
CompactionExecutor 1 1 27492 0 0
InternalResponseStage 0 0 4 0 0
HintedHandoff 0 0 3 0 0
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
PAGED_RANGE 0
BINARY 0
READ 0
MUTATION 0
_TRACE 0
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
Anyone any tips on how to make that hanging compaction go away and on why this happpens in the first place?
All tips tremendously appreciated!
Tx
Peter
Let's assume you're using SizeTieredCompaction and you have four SSTables of size X, a compaction will merge them into one SSTable of size Y and this process repeats itself.
Problem: A compaction will create a new SSTable of size Y and both the new and old SSTables of size X exist during the compaction.
In the worst case (with no deletes and overwrites), a compaction will require 2 times of the on-disk space used for SSTables, or more specifically: at certain points you need to have enough disk space to hold the SSTables of size X and Y.
So even though it seems that you have enough space left, you might run out of disk space during compaction.
You might wanna try LeveledCompactionStrategy because it needs much less space for compaction (10 x sstable_size_in_mb). See also http://www.datastax.com/dev/blog/when-to-use-leveled-compaction for when to use LeveledCompactionStrategy.
No matter which compaction strategy you use, you should always leave enough free disk space to accommodate streaming, repair, and snapshots.