Search code examples
ansibledigital-oceanmesosmesospheremarathon

Setting up Mesos with Ansible on Ubuntu 14.04 on Digital Ocean


I have been following this tutorial How to configure a production ready Mesos cluster and have been creating an ansible playbook along the way which you can see here mesos ansible playbook

Ansible runs successfully and I can visit my port 5050 on a master and see the mesos dashboard. However there seems to be 3 problems which are hopefully all connected but seem seperate at face value.

  1. At the top of mesos dashboard it says no masters are currently leading
  2. No slaves are registered
  3. The Marathon dashboard does not work when I visit port 8080 on any of the masters

Any ideas of what I have done wrong or if anything has changed since this tutorial was published?

Edit: tried to dig in deeper. After running ansible I logged into each node and restarted the mesos and marathon services myself manually. This appeared to do the trick as I got to the marathon dashboard and then after a bit of fiddling on the slaves I could see those where activated as well. Unfortunately I was not able to reproduce after nuking the nodes and rebuilding. My settings are consistent with the tutorial I linked and the tutorial linked by Celine so I think it is the order I am doing my service restarts. Still looking for any help

Edit2: Copy of logs from one of the masters on startup the last http call just repeats and repeats

I1014 18:56:32.746968 11494 logging.cpp:172] INFO level logging started! I1014 18:56:32.748177 11494 main.cpp:229] Build: 2015-10-12 20:57:28 by root I1014 18:56:32.748277 11494 main.cpp:231] Version: 0.25.0 I1014 18:56:32.748345 11494 main.cpp:234] Git tag: 0.25.0 I1014 18:56:32.748406 11494 main.cpp:238] Git SHA: 2dd7f7ee115fe00b8e098b0a10762a4fa8f4600f I1014 18:56:32.748615 11494 main.cpp:252] Using 'HierarchicalDRF' allocator I1014 18:56:32.759768 11494 leveldb.cpp:176] Opened db in 10.929155ms I1014 18:56:32.763638 11494 leveldb.cpp:183] Compacted db in 3.722708ms I1014 18:56:32.763713 11494 leveldb.cpp:198] Created db iterator in 33931ns I1014 18:56:32.763761 11494 leveldb.cpp:204] Seeked to beginning of db in 8624ns I1014 18:56:32.764142 11494 leveldb.cpp:273] Iterated through 1 keys in the db in 352415ns I1014 18:56:32.764263 11494 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I1014 18:56:32.767266 11520 log.cpp:238] Attempting to join replica to ZooKeeper group I1014 18:56:32.767493 11520 recover.cpp:449] Starting replica recovery I1014 18:56:32.767623 11520 recover.cpp:475] Replica is in VOTING status I1014 18:56:32.767695 11520 recover.cpp:464] Recover process terminated I1014 18:56:32.775274 11494 main.cpp:465] Starting Mesos master I1014 18:56:32.779567 11516 master.cpp:376] Master 75abeaaa-a949-45a3-bd85-bebf100eecad (159.203.107.10) started on 159.203.107.10:5050 I1014 18:56:32.779597 11516 master.cpp:378] Flags at startup: --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="false" --authenticate_slaves="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname="159.203.107.10" --hostname_lookup="true" --initialize_driver_logging="true" --ip="159.203.107.10" --log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" --port="5050" --quiet="false" --quorum="1" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" --registry_strict="false" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/var/lib/mesos" --zk="zk://159.203.107.10:2181,159.203.107.151:2181,159.203.107.162:2181/mesos" --zk_session_timeout="10secs" I1014 18:56:32.779762 11516 master.cpp:425] Master allowing unauthenticated frameworks to register I1014 18:56:32.779770 11516 master.cpp:430] Master allowing unauthenticated slaves to register I1014 18:56:32.779778 11516 master.cpp:467] Using default 'crammd5' authenticator W1014 18:56:32.779798 11516 authenticator.cpp:505] No credentials provided, authentication requests will be refused I1014 18:56:32.779906 11516 authenticator.cpp:512] Initializing server SASL I1014 18:56:32.791836 11515 master.cpp:1542] Successfully attached file '/var/log/mesos/mesos-master.INFO' I1014 18:56:32.792043 11519 contender.cpp:149] Joining the ZK group I1014 18:56:34.968217 11517 http.cpp:336] HTTP GET for /master/state.json from 12.228.115.34:40863 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36' I1014 18:56:45.242039 11518 http.cpp:336] HTTP GET for /master/state.json from 12.228.115.34:63018 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36' I1014 18:56:55.319259 11519 http.cpp:336] HTTP GET for /master/state.json from 12.228.115.34:50024 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 1

Thanks


Solution

  • This was a zookeeper config problem. None of the tutorials mention needing to set values in zoo.cfg besides listing the server ips. You also need to set dataDir, syncLimit, initLimit, tickTime and clientPort