I'm trying to rebalance a 6-node DSE cluster using OpsCenter 5.2.1 and the job failed at 80% with the following stack error in opscenterd.log. We're running DSE 4.8.0 with 6 search nodes. How do I recover from this error? Can I run re-balance again? Or should I run repair first? (Or something else?)
2016-05-02 23:27:18+0000 [local] WARN: Marking request '10.0.30.57: /ops/cleanup' (3bf6010a-6790-4301-9bf1-53c37fac61d8) as failed: java.rmi.UnmarshalException: Error unmarshaling return; nested exception is:
java.lang.ClassNotFoundException: org.apache.cassandra.db.compaction.CompactionInterruptedException (no security manager: RMI class loader disabled)
2016-05-02 23:27:18+0000 [local] WARN: Marking request 'Cluster Rebalance' (6303d5ea-4382-40ff-905f-cbca051c7fa9) as failed: java.rmi.UnmarshalException: Error unmarshaling return; nested exception is:
java.lang.ClassNotFoundException: org.apache.cassandra.db.compaction.CompactionInterruptedException (no security manager: RMI class loader disabled)
2016-05-02 23:27:18+0000 [local] ERROR: Rebalance failed: java.rmi.UnmarshalException: Error unmarshaling return; nested exception is:
java.lang.ClassNotFoundException: org.apache.cassandra.db.compaction.CompactionInterruptedException (no security manager: RMI class loader disabled)
2016-05-02 23:27:18+0000 [local] ERROR: Traceback (most recent call last):
Failure: twisted.python.failure.DefaultException: java.rmi.UnmarshalException: Error unmarshaling return; nested exception is:
java.lang.ClassNotFoundException: org.apache.cassandra.db.compaction.CompactionInterruptedException (no security manager: RMI class loader disabled)
2016-05-02 23:27:18+0000 [local] ERROR: Rebalance failed: java.rmi.UnmarshalException: Error unmarshaling return; nested exception is:
java.lang.ClassNotFoundException: org.apache.cassandra.db.compaction.CompactionInterruptedException (no security manager: RMI class loader disabled)
2016-05-02 23:27:18+0000 [local] ERROR: Rolling job failed: java.rmi.UnmarshalException: Error unmarshaling return; nested exception is:
java.lang.ClassNotFoundException: org.apache.cassandra.db.compaction.CompactionInterruptedException (no security manager: RMI class loader disabled)
2016-05-02 23:27:18+0000 [local] ERROR: Traceback (most recent call last):
File "build/lib/python2.7/site-packages/opscenterd/ClusterUtils.py", line 607, in run
DefaultException: java.rmi.UnmarshalException: Error unmarshaling return; nested exception is:
java.lang.ClassNotFoundException: org.apache.cassandra.db.compaction.CompactionInterruptedException (no security manager: RMI class loader disabled)
2016-05-02 23:27:18+0000 [local] ERROR: Rolling job failed: java.rmi.UnmarshalException: Error unmarshaling return; nested exception is:
java.lang.ClassNotFoundException: org.apache.cassandra.db.compaction.CompactionInterruptedException (no security manager: RMI class loader disabled)
2016-05-02 23:27:18+0000 [local] ERROR: Traceback (most recent call last):
File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/defer.py", line 388, in errback
self._startRunCallbacks(fail)
File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/defer.py", line 455, in _startRunCallbacks
self._runCallbacks()
File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/defer.py", line 542, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/defer.py", line 1076, in gotResult
_inlineCallbacks(r, g, deferred)
--- <exception caught here> ---
File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/defer.py", line 1018, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/python/failure.py", line 349, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "build/lib/python2.7/site-packages/opscenterd/ClusterUtils.py", line 613, in run
File "build/lib/python2.7/site-packages/opscenterd/ClusterUtils.py", line 607, in run
twisted.python.failure.DefaultException: java.rmi.UnmarshalException: Error unmarshaling return; nested exception is:
java.lang.ClassNotFoundException: org.apache.cassandra.db.compaction.CompactionInterruptedException (no security manager: RMI class loader disabled)
That exception likely got thrown because there was already a cleanup running on the node when opscenterd tried to start a cleanup after moving a node during the rebalance. The opscenter agent doesn't have the exception in its class path and so couldn't deserialize it.
To continue, run a repair to make sure the data is everywhere it needs to be, run a cleanup to make sure to avoid a repeat of this issue, and then use opscenter to start another rebalance.