I am using Datastax OpsCenter v5.2.4 with DSE v4.8.4-1. Since few days ago, I've been not able to retrieve the result of "Best Practice Service" both from API and opscenter UI. When I try to get it, I get errors like below.
GUI:
Could not retrieve best practice rules: Scheduled jobs have not been loaded yet. There may be a connectivity problem with Cassandra.
opscenterd.log
2016-07-01 19:47:50+0000 [] ERROR: Problem while calling decorator (SchedulesNotLoaded): Scheduled jobs have not been loaded yet. There may be a connectivity problem with Cassandra.
File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/defer.py", line 1020, in _inlineCallbacks
result = g.send(result)
File "/usr/lib/python2.7/dist-packages/opscenterd/WebServer.py", line 1939, in SchedulesGetController
File "/usr/lib/python2.7/dist-packages/opscenterd/Schedule.py", line 213, in getAllSchedules
File "/usr/lib/python2.7/dist-packages/opscenterd/Schedule.py", line 175, in _assert_loaded
I've tried to restart opscenterd service as well as rebooting the opscenter machine itself but it didn't make any difference. The error says there might be some connectivity issue, but what port/protocol is opscenter using to load these scheduled jobs? (there is no firewall between opscenter and cassandra nodes) There is no alert in the cluster, and agents are all connected according to opscenter's GUI.
I couldn't find any relevant trouble shooting documentation ... how can we recover opscenter from this situation?
Issue is around when schedules settings get in messed up state. If you schedule a one time only run, and opscenterd is down when it is scheduled to run, then on startup opscenterd dies loading that schedule.
If you don't have anything particularly important stored there is an easy fix. Shutdown off opscenter, use cqlsh to drop keyspace "OpsCenter";
and restart opscenter.
Otherwise you have to hand clean up some of the schedules in OpsCenter.settings
table that got messed up. This is fixed in 6.0, so if you upgrade it wont happen again.