Search code examples
apache-zookeepermesosmesospheremarathon

Apache Mesos - Zookeeper failing on load, cannot access marathon/attach slaves.


I am setting up a Mesos cluster. Our setup is:

3 Primary boxes (8gb RAM, 4 cpu) 3 Worker boxes (1gb RAM, 1 cpu)

The configuration files I have are all matching and proper from what I can see. In /etc/mesos/zk I have:

zk://106.133.117.128:2181,zk://153.213.95.171:2181,zk://106.121.34.29:2181/mesos

(I changed the IP addresses from the actual ones, but will use them with these same numbers when referenced throughout)

I am not quite sure where to go from here. I have stepped through each piece of configuration.

The ID's are located at /etc/zookeeper/conf/myid on each machine and properly set up. In the config on each machine for zookeeper conf they are set to the matching IP and id as well.

My Quorum size is 2.

IP and hostname are set to the IP of each machine respectively.

The configuration for marathon in /etc/marathon/conf/master reads:

zk://106.133.117.128:2181,zk://153.213.95.171:2181,zk://106.121.34.29:2181/marathon

The exact error from the logs is:

Log file created at: 2015/10/01 13:56:32
Running on machine: mesos-primary-1
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1001 13:56:32.595760  6618 logging.cpp:172] INFO level logging started!
I1001 13:56:32.596060  6618 main.cpp:229] Build: 2015-09-25 19:13:24 by root
I1001 13:56:32.596082  6618 main.cpp:231] Version: 0.24.1
I1001 13:56:32.596094  6618 main.cpp:234] Git tag: 0.24.1
I1001 13:56:32.596106  6618 main.cpp:238] Git SHA: 44873806c2bb55da37e9adbece938274d8cd7c48
I1001 13:56:32.596161  6618 main.cpp:252] Using 'HierarchicalDRF' allocator
I1001 13:56:32.602738  6618 leveldb.cpp:176] Opened db in 6.456045ms
I1001 13:56:32.611217  6618 leveldb.cpp:183] Compacted db in 8.423531ms
I1001 13:56:32.611312  6618 leveldb.cpp:198] Created db iterator in 22068ns
I1001 13:56:32.611348  6618 leveldb.cpp:204] Seeked to beginning of db in 1287ns
I1001 13:56:32.611372  6618 leveldb.cpp:273] Iterated through 0 keys in the db in 376ns
I1001 13:56:32.611448  6618 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
I1001 13:56:32.647243  6648 log.cpp:238] Attempting to join replica to ZooKeeper group
I1001 13:56:32.689388  6651 recover.cpp:449] Starting replica recovery
I1001 13:56:32.690028  6651 recover.cpp:475] Replica is in EMPTY status
W1001 13:56:32.690147  6649 zookeeper.cpp:101] zookeeper_init failed: Invalid argument ; retrying in 1 second
W1001 13:56:32.690726  6644 zookeeper.cpp:101] zookeeper_init failed: Invalid argument ; retrying in 1 second
W1001 13:56:32.690768  6647 zookeeper.cpp:101] zookeeper_init failed: Invalid argument ; retrying in 1 second
W1001 13:56:32.690821  6645 zookeeper.cpp:101] zookeeper_init failed: Invalid argument ; retrying in 1 second
I1001 13:56:32.690891  6618 main.cpp:465] Starting Mesos master
I1001 13:56:32.691463  6618 master.cpp:378] Master 20151001-135632-2088076136-5050-6618 (104.131.117.124) started on 106.133.117.128:5050
I1001 13:56:32.691494  6618 master.cpp:380] Flags at startup: --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="false" --authenticate_slaves="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname="106.133.117.128" --initialize_driver_logging="true" --ip="106.133.117.128" --log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" --port="5050" --quiet="false" --quorum="2" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" --registry_strict="false" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/var/lib/mesos" --zk="zk://106.133.117.128:2181,zk://153.213.95.171:2181,zk://106.121.34.29:2181/mesoss" --zk_session_timeout="10secs"
I1001 13:56:32.691671  6618 master.cpp:427] Master allowing unauthenticated frameworks to register
I1001 13:56:32.691700  6618 master.cpp:432] Master allowing unauthenticated slaves to register
I1001 13:56:32.691725  6618 master.cpp:469] Using default 'crammd5' authenticator
W1001 13:56:32.691756  6618 authenticator.cpp:505] No credentials provided, authentication requests will be refused.
I1001 13:56:32.691790  6618 authenticator.cpp:512] Initializing server SASL
I1001 13:56:32.695333  6646 master.cpp:1464] Successfully attached file '/var/log/mesos/mesos-master.INFO'
I1001 13:56:32.695377  6646 contender.cpp:149] Joining the ZK group
W1001 13:56:33.690989  6649 zookeeper.cpp:101] zookeeper_init failed: Invalid argument ; retrying in 1 second
W1001 13:56:33.691220  6644 zookeeper.cpp:101] zookeeper_init failed: Invalid argument ; retrying in 1 second

Any input is much appreciated.


Solution

  • You mesos zookeeper string is malformatted. It should be of the form

    zk://host1:port1,host2:port2,host3:port3/path
    

    That should fix your issue (barring any other configuration problems).

    Answered on #mesos on freenode as well.