Search code examples
datastax-enterpriseopscenterdatastax-startup

OpsCenter 5.2.1 rebalance failed


I'm trying to rebalance a 6-node DSE cluster using OpsCenter 5.2.1 and the job failed at 80% with the following stack error in opscenterd.log. We're running DSE 4.8.0 with 6 search nodes. How do I recover from this error? Can I run re-balance again? Or should I run repair first? (Or something else?)

2016-05-02 23:27:18+0000 [local]  WARN: Marking request '10.0.30.57: /ops/cleanup' (3bf6010a-6790-4301-9bf1-53c37fac61d8) as failed: java.rmi.UnmarshalException: Error unmarshaling return; nested exception is:
                java.lang.ClassNotFoundException: org.apache.cassandra.db.compaction.CompactionInterruptedException (no security manager: RMI class loader disabled)
2016-05-02 23:27:18+0000 [local]  WARN: Marking request 'Cluster Rebalance' (6303d5ea-4382-40ff-905f-cbca051c7fa9) as failed: java.rmi.UnmarshalException: Error unmarshaling return; nested exception is:
                java.lang.ClassNotFoundException: org.apache.cassandra.db.compaction.CompactionInterruptedException (no security manager: RMI class loader disabled)
2016-05-02 23:27:18+0000 [local] ERROR: Rebalance failed: java.rmi.UnmarshalException: Error unmarshaling return; nested exception is:
                java.lang.ClassNotFoundException: org.apache.cassandra.db.compaction.CompactionInterruptedException (no security manager: RMI class loader disabled)
2016-05-02 23:27:18+0000 [local] ERROR: Traceback (most recent call last):
        Failure: twisted.python.failure.DefaultException: java.rmi.UnmarshalException: Error unmarshaling return; nested exception is:
                java.lang.ClassNotFoundException: org.apache.cassandra.db.compaction.CompactionInterruptedException (no security manager: RMI class loader disabled)
2016-05-02 23:27:18+0000 [local] ERROR: Rebalance failed: java.rmi.UnmarshalException: Error unmarshaling return; nested exception is:
                java.lang.ClassNotFoundException: org.apache.cassandra.db.compaction.CompactionInterruptedException (no security manager: RMI class loader disabled)
2016-05-02 23:27:18+0000 [local] ERROR: Rolling job failed: java.rmi.UnmarshalException: Error unmarshaling return; nested exception is:
                java.lang.ClassNotFoundException: org.apache.cassandra.db.compaction.CompactionInterruptedException (no security manager: RMI class loader disabled)
2016-05-02 23:27:18+0000 [local] ERROR: Traceback (most recent call last):
          File "build/lib/python2.7/site-packages/opscenterd/ClusterUtils.py", line 607, in run
        DefaultException: java.rmi.UnmarshalException: Error unmarshaling return; nested exception is:
                java.lang.ClassNotFoundException: org.apache.cassandra.db.compaction.CompactionInterruptedException (no security manager: RMI class loader disabled)
2016-05-02 23:27:18+0000 [local] ERROR: Rolling job failed: java.rmi.UnmarshalException: Error unmarshaling return; nested exception is:
                java.lang.ClassNotFoundException: org.apache.cassandra.db.compaction.CompactionInterruptedException (no security manager: RMI class loader disabled)
2016-05-02 23:27:18+0000 [local] ERROR: Traceback (most recent call last):
          File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/defer.py", line 388, in errback
            self._startRunCallbacks(fail)
          File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/defer.py", line 455, in _startRunCallbacks
            self._runCallbacks()
          File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/defer.py", line 542, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/defer.py", line 1076, in gotResult
            _inlineCallbacks(r, g, deferred)
        --- <exception caught here> ---
          File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/defer.py", line 1018, in _inlineCallbacks
            result = result.throwExceptionIntoGenerator(g)
          File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/python/failure.py", line 349, in throwExceptionIntoGenerator
            return g.throw(self.type, self.value, self.tb)
          File "build/lib/python2.7/site-packages/opscenterd/ClusterUtils.py", line 613, in run
          File "build/lib/python2.7/site-packages/opscenterd/ClusterUtils.py", line 607, in run
        twisted.python.failure.DefaultException: java.rmi.UnmarshalException: Error unmarshaling return; nested exception is:
                java.lang.ClassNotFoundException: org.apache.cassandra.db.compaction.CompactionInterruptedException (no security manager: RMI class loader disabled)

Solution

  • That exception likely got thrown because there was already a cleanup running on the node when opscenterd tried to start a cleanup after moving a node during the rebalance. The opscenter agent doesn't have the exception in its class path and so couldn't deserialize it.

    To continue, run a repair to make sure the data is everywhere it needs to be, run a cleanup to make sure to avoid a repeat of this issue, and then use opscenter to start another rebalance.