Cassandra - compaction stuck

Upfront warning - cassandra beginner

I have setup a 4 node m3.xlarge cluster on aws using the datastax enterprise ami and loaded data using the Cassandra bulkloader approach.

Cassandra version is "ReleaseVersion: 2.1.9.791"

One of the four nodes - the one I started the buklkload from - seems to be stuck in compaction (for last 12 hours nothing changed)

$ nodetool compactionstats
pending tasks: 1
   compaction type   keyspace          table     completed         total    unit   progress
        Compaction   xxx   yyy   60381305196   66396499686   bytes     90.94%
Active compaction remaining time :   0h05m58s

I have also noticed that sometimes that node becomes unavailable (goes red in opscenter) but after a while (a long while) it becomes available again.

In the cassandra log is an exception (see below). What is weird though is that there is lot's of disk space left.

> ERROR [MemtableFlushWriter:3] 2015-10-29 23:54:21,511 
> CassandraDaemon.java:223 - Exception in thread
> Thread[MemtableFlushWriter:3,5,main]
> org.apache.cassandra.io.FSWriteError: java.io.IOException: No space
> left on device
>         at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:663)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
>         at org.apache.cassandra.io.sstable.SSTableWriter.close(SSTableWriter.java:500)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
>         at org.apache.cassandra.io.sstable.SSTableWriter.finish(SSTableWriter.java:453)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
>         at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:445)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
>         at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:440)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
>         at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:389)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
>         at org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:335)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
>         at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
> ~[guava-16.0.1.jar:na]
>         at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1154)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> ~[na:1.7.0_80]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> ~[na:1.7.0_80]
>         at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80] Caused by: java.io.IOException: No space left on device
>         at java.io.FileOutputStream.writeBytes(Native Method) ~[na:1.7.0_80]
>         at java.io.FileOutputStream.write(FileOutputStream.java:345) ~[na:1.7.0_80]
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> ~[na:1.7.0_80]
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> ~[na:1.7.0_80]
>         at org.apache.cassandra.io.util.DataOutputStreamPlus.flush(DataOutputStreamPlus.java:55)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
>         at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:657)
> ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
>         ... 12 common frames omitted

Tpstats output is

   $ nodetool tpstats
Pool Name                    Active   Pending      Completed   Blocked  All time blocked
CounterMutationStage              0         0              0         0                 0
ReadStage                         0         0          19485         0                 0
RequestResponseStage              0         0         116191         0                 0
MutationStage                     0         0         386132         0                 0
ReadRepairStage                   0         0            848         0                 0
GossipStage                       0         0          46669         0                 0
CacheCleanupExecutor              0         0              0         0                 0
AntiEntropyStage                  0         0              0         0                 0
MigrationStage                    0         0              1         0                 0
Sampler                           0         0              0         0                 0
ValidationExecutor                0         0              0         0                 0
CommitLogArchiver                 0         0              0         0                 0
MiscStage                         0         0              0         0                 0
MemtableFlushWriter               0         0             80         0                 0
MemtableReclaimMemory             0         0             79         0                 0
PendingRangeCalculator            0         0              4         0                 0
MemtablePostFlush                 1        33            127         0                 0
CompactionExecutor                1         1          27492         0                 0
InternalResponseStage             0         0              4         0                 0
HintedHandoff                     0         0              3         0                 0

Message type           Dropped
RANGE_SLICE                  0
READ_REPAIR                  0
PAGED_RANGE                  0
BINARY                       0
READ                         0
MUTATION                     0
_TRACE                       0
REQUEST_RESPONSE             0
COUNTER_MUTATION             0

Anyone any tips on how to make that hanging compaction go away and on why this happpens in the first place?

All tips tremendously appreciated!

Peter

Solution

Let's assume you're using SizeTieredCompaction and you have four SSTables of size X, a compaction will merge them into one SSTable of size Y and this process repeats itself.

Problem: A compaction will create a new SSTable of size Y and both the new and old SSTables of size X exist during the compaction.

In the worst case (with no deletes and overwrites), a compaction will require 2 times of the on-disk space used for SSTables, or more specifically: at certain points you need to have enough disk space to hold the SSTables of size X and Y.

So even though it seems that you have enough space left, you might run out of disk space during compaction.

You might wanna try LeveledCompactionStrategy because it needs much less space for compaction (10 x sstable_size_in_mb). See also http://www.datastax.com/dev/blog/when-to-use-leveled-compaction for when to use LeveledCompactionStrategy.

No matter which compaction strategy you use, you should always leave enough free disk space to accommodate streaming, repair, and snapshots.