Search code examples
dcos

DCOS navstar service failed to start on agent nodes


I'm setting up DC/OS on dev servers and faced with one of agent node failing to run navstar service:

# journalctl -u dcos-navstar -b
Mar 18 13:45:15 localhost.localdomain systemd[1]: Starting Navstar: A distributed systems & network overlay orchestration engine...
Mar 18 13:45:15 localhost.localdomain check-time[5868]: Checking whether time is synchronized using the kernel adjtimex API.
Mar 18 13:45:15 localhost.localdomain check-time[5868]: Time can be synchronized via most popular mechanisms (ntpd, chrony, systemd-timesyncd, etc.)
Mar 18 13:45:15 localhost.localdomain check-time[5868]: Time is in sync!
Mar 18 13:45:15 localhost.localdomain ping[5870]: ping: ready.spartan: Name or service not known
Mar 18 13:45:15 localhost.localdomain systemd[1]: dcos-navstar.service: control process exited, code=exited status=2
Mar 18 13:45:15 localhost.localdomain systemd[1]: Failed to start Navstar: A distributed systems & network overlay orchestration engine.

The ntpd service is installed and running (service is active). Time synchronization with ntpd works fine. Please advice.


Solution

  • Check 123 port is open and is not blocked by iptables or other firewall. Or try to use chrony as a service to synchronize the system clock with NTP servers (it is more accurate and has more features than ntp). For CentOS:

    yum install chrony
    

    I had the same trouble with DC/OS. But not only navstar.service, but also metronome.service was failed (same time sync issue). Spent lot's of time searching for the grain of problem. Finally migrated to chrony and the problem disappeared.