DSE Version 4.8.2
I added 6 new nodes to a cluster with data and put bootstrap:false. After they joined, I am running a rebuild on each. I believe 1 NODE is done but the command at the console is still "running" (For example, I cannot run another command yet). I want to make sure its fully done. There is no compaction and no streams active to that unit. UPDATE : It has now been 4 days and still sitting at the command prompt.
Is there anything other than compationstats and netstats that maybe I am missing? I saw it stream the data, then it compacted it but now.....
One more question, after I am fully finished rebuilds then cleanups is there any other tasks I should consider to fully sync the cluster?
UPDATE :
As I am trying to run a rebuild I keep getting the following error. I upped my file limit in Ubuntu 14.04 to 200,000 and I still get the error.
INFO [MemtableFlushWriter:747] 2016-02-29 03:57:18,114 Memtable.java:382 - Completed flushing /media/slot02/cjd/match-b633b251a04f11e58b7b89a485a622c1/cjd-match-tmp-ka-127932-Data.db (71.866MiB) for commitlog position ReplayPosition(segmentId=1456708497054, position=14141564) INFO [ScheduledTasks:1] 2016-02-29 03:58:33,573 ColumnFamilyStore.java:905 - Enqueuing flush of compaction_history: 17177 (0%) on-heap, 0 (0%) off-heap INFO [MemtableFlushWriter:748] 2016-02-29 03:58:33,574 Memtable.java:347 - Writing Memtable-compaction_history@971836863(3.428KiB serialized bytes, 123 ops, 0%/0% of on/off-heap limit) INFO [MemtableFlushWriter:748] 2016-02-29 03:58:33,575 Memtable.java:382 - Completed flushing /media/slot01/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-tmp-ka-142-Data.db (0.000KiB) for commitlog position ReplayPosition(segmentId=1456708497058, position=20942643) WARN [STREAM-IN-/10.0.1.243] 2016-02-29 04:00:02,317 CLibrary.java:231 - open(/media/slot01/cjd/match-b633b251a04f11e58b7b89a485a622c1, O_RDONLY) failed, errno (24). ERROR [STREAM-IN-/10.0.1.243] 2016-02-29 04:00:02,541 JVMStabilityInspector.java:117 - JVM state determined to be unstable. Exiting forcefully due to: java.io.FileNotFoundException: /media/slot01/cjd/match-b633b251a04f11e58b7b89a485a622c1/cjd-match-tmp-ka-128371-Index.db (Too many open files) at java.io.RandomAccessFile.open0(Native Method) ~[na:1.8.0_72] at java.io.RandomAccessFile.open(RandomAccessFile.java:316) ~[na:1.8.0_72] at java.io.RandomAccessFile.(RandomAccessFile.java:243) ~[na:1.8.0_72] at org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:78) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.util.SequentialWriter.open(SequentialWriter.java:111) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.util.SequentialWriter.open(SequentialWriter.java:106) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.(SSTableWriter.java:587) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.sstable.SSTableWriter.(SSTableWriter.java:140) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.sstable.SSTableWriter.(SSTableWriter.java:81) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.streaming.StreamReader.createWriter(StreamReader.java:135) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.streaming.compress.CompressedStreamReader.read(CompressedStreamReader.java:80) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:48) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:38) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:56) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:250) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72] INFO [Thread-2] 2016-02-29 04:00:02,543 DseDaemon.java:418 - DSE shutting down...
My OpenFile Limit is currently 200,000 from ulimit -a. I could try to go higher, but Cassandra suggests a 100,000 limit.
If I had to GUESS, the issue is one node cant compact due to 2 disks being full on a node where the data is coming from. As it pulls the rebuild data from there, its pulling 50,000 small files unlike other nodes pulling 1-2 larger files. Maybe that is what I have to fix first?
Really need help... Thanks!
Thanks,
I added 6 new nodes to a cluster with data and put bootstrap:false
First, this is wrong, according to the documentation, when adding a new node to a cluster, you should put auto_bootstrap = true
See here: http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_node_to_cluster_t.html
Second, it is recommended to add nodes one by one or two nodes at a time, not all 6 nodes at the same time because it will put a lot of pressure on the network (because of data streaming)