I've been trying to set up apache mesos, with two machines, one as a slave, the other with a master and a slave. I've been using the mesosphere packages for this.
The slave on the master machine (james-pc) connects fine, but the slave on the other machine doesn't seem to connect. Log messages below.
these are samples. The timestamps may not match. I've run the same commands and read the og files a lot :(
Thanks!!
Slave
I1015 13:44:40.098458 16485 main.cpp:126] Build: 2014-09-23 05:36:09 by root
I1015 13:44:40.098520 16485 main.cpp:128] Version: 0.20.1
I1015 13:44:40.098530 16485 main.cpp:131] Git tag: 0.20.1
I1015 13:44:40.098537 16485 main.cpp:135] Git SHA: fe0a39112f3304283f970f1b08b322b1e970829d
I1015 13:44:40.098558 16485 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem
I1015 13:44:40.100411 16485 main.cpp:149] Starting Mesos slave
I1015 13:44:40.101066 16485 slave.cpp:167] Slave started on 1)@127.0.1.1:5051
I1015 13:44:40.101238 16485 slave.cpp:278] Slave resources: cpus(*):4; mem(*):6649; disk(*):109050; ports(*):[31000-32000]
I1015 13:44:40.101335 16485 slave.cpp:306] Slave hostname: riri-desktop
I1015 13:44:40.101346 16485 slave.cpp:307] Slave checkpoint: true
I1015 13:44:40.102597 16489 state.cpp:33] Recovering state from '/tmp/mesos/meta'
I1015 13:44:40.102684 16489 state.cpp:62] Failed to find the latest slave from '/tmp/mesos/meta'
I1015 13:44:40.102777 16493 status_update_manager.cpp:193] Recovering status update manager
I1015 13:44:40.102821 16493 containerizer.cpp:252] Recovering containerizer
I1015 13:44:40.102982 16491 slave.cpp:3198] Finished recovery
I1015 13:44:40.103219 16488 slave.cpp:589] New master detected at [email protected]:5050
I1015 13:44:40.103313 16488 slave.cpp:625] No credentials provided. Attempting to register without authentication
I1015 13:44:40.103317 16491 status_update_manager.cpp:167] New master detected at [email protected]:5050
I1015 13:44:40.103333 16488 slave.cpp:636] Detecting new master
I1015 13:45:40.109150 16487 slave.cpp:3053] Current usage 27.72%. Max allowed age: 4.359784084743518days
I1015 13:46:40.119501 16489 slave.cpp:3053] Current usage 27.72%. Max allowed age: 4.359794862235926days
master
I1015 13:47:55.462615 5670 hierarchical_allocator_process.hpp:563] Recovered cpus(*):4; mem(*):6659; disk(*):107176; ports(*):[31000-32000] (total allocatable: cpus(*):4; mem(*):6659; disk(*):107176; ports(*):[31000-32000]) on slave 20141015-130401-16842879-5050-3432-0 from framework 20141015-134423-16842879-5050-5654-0000
I1015 13:47:58.048534 5671 http.cpp:466] HTTP request for '/master/state.json'
I1015 13:48:01.461993 5667 master.cpp:3559] Sending 1 offers to framework 20141015-134423-16842879-5050-5654-0000
I1015 13:48:01.464038 5670 master.cpp:2169] Processing reply for offers: [ 20141015-134423-16842879-5050-5654-36 ] on slave 20141015-130401-16842879-5050-3432-0 at slave(1)@127.0.1.1:5051 (james-pc.syd.local) for framework 20141015-134423-16842879-5050-5654-0000
I1015 13:48:01.464246 5670 hierarchical_allocator_process.hpp:563] Recovered cpus(*):4; mem(*):6659; disk(*):107176; ports(*):[31000-32000] (total allocatable: cpus(*):4; mem(*):6659; disk(*):107176; ports(*):[31000-32000]) on slave 20141015-130401-16842879-5050-3432-0 from framework 20141015-134423-16842879-5050-5654-0000
I1015 13:48:06.464457 5669 master.cpp:3559] Sending 1 offers to framework 20141015-134423-16842879-5050-5654-0000
I1015 13:48:06.466624 5667 master.cpp:2169] Processing reply for offers: [ 20141015-134423-16842879-5050-5654-37 ] on slave 20141015-130401-16842879-5050-3432-0 at slave(1)@127.0.1.1:5051 (james-pc.syd.local) for framework 20141015-134423-16842879-5050-5654-0000
I1015 13:48:06.466841 5671 hierarchical_allocator_process.hpp:563] Recovered cpus(*):4; mem(*):6659; disk(*):107176; ports(*):[31000-32000] (total allocatable: cpus(*):4; mem(*):6659; disk(*):107176; ports(*):[31000-32000]) on slave 20141015-130401-16842879-5050-3432-0 from framework 20141015-134423-16842879-5050-5654-0000
I1015 13:48:08.064483 5673 http.cpp:466] HTTP request for '/master/state.json'
I1015 13:48:12.465992 5674 master.cpp:3559] Sending 1 offers to framework 20141015-134423-16842879-5050-5654-0000
I1015 13:48:12.468195 5670 master.cpp:2169] Processing reply for offers: [ 20141015-134423-16842879-5050-5654-38 ] on slave 20141015-130401-16842879-5050-3432-0 at slave(1)@127.0.1.1:5051 (james-pc.syd.local) for framework 20141015-134423-16842879-5050-5654-0000
I1015 13:48:12.468408 5670 hierarchical_allocator_process.hpp:563] Recovered cpus(*):4; mem(*):6659; disk(*):107176; ports(*):[31000-32000] (total allocatable: cpus(*):4; mem(*):6659; disk(*):107176; ports(*):[31000-32000]) on slave 20141015-130401-16842879-5050-3432-0 from framework 20141015-134423-16842879-5050-5654-0000
james@james-pc:/var/log/mesos$ cat mesos-slave.james-pc.invalid-user.log.INFO.20141015-134946.6069
Log file created at: 2014/10/15 13:49:46
Running on machine: james-pc
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1015 13:49:46.323657 6069 logging.cpp:142] INFO level logging started!
I1015 13:49:46.323825 6069 main.cpp:126] Build: 2014-09-23 05:36:09 by root
I1015 13:49:46.323837 6069 main.cpp:128] Version: 0.20.1
I1015 13:49:46.323842 6069 main.cpp:131] Git tag: 0.20.1
I1015 13:49:46.323846 6069 main.cpp:135] Git SHA: fe0a39112f3304283f970f1b08b322b1e970829d
I1015 13:49:46.323860 6069 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem
I1015 13:49:46.324012 6069 main.cpp:149] Starting Mesos slave
I1015 13:49:46.324472 6084 slave.cpp:167] Slave started on 1)@127.0.1.1:5051
I1015 13:49:46.324604 6084 slave.cpp:278] Slave resources: cpus(*):4; mem(*):6659; disk(*):107176; ports(*):[31000-32000]
I1015 13:49:46.324697 6084 slave.cpp:306] Slave hostname: james-pc.syd.local
I1015 13:49:46.324709 6084 slave.cpp:307] Slave checkpoint: true
I1015 13:49:46.326089 6079 state.cpp:33] Recovering state from '/tmp/mesos/meta'
I1015 13:49:46.326375 6084 status_update_manager.cpp:193] Recovering status update manager
I1015 13:49:46.326452 6079 containerizer.cpp:252] Recovering containerizer
I1015 13:49:46.326608 6083 slave.cpp:3198] Finished recovery
I1015 13:49:46.327335 6084 group.cpp:313] Group process (group(1)@127.0.1.1:5051) connected to ZooKeeper
I1015 13:49:46.327352 6084 group.cpp:787] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I1015 13:49:46.327360 6084 group.cpp:385] Trying to create path '/mesos' in ZooKeeper
I1015 13:49:46.328199 6085 detector.cpp:138] Detected a new leader: (id='5')
I1015 13:49:46.328272 6085 group.cpp:658] Trying to get '/mesos/info_0000000005' in ZooKeeper
I1015 13:49:46.328738 6084 detector.cpp:426] A new leading master ([email protected]:5050) is detected
I1015 13:49:46.328806 6085 slave.cpp:589] New master detected at [email protected]:5050
I1015 13:49:46.328881 6085 slave.cpp:625] No credentials provided. Attempting to register without authentication
I1015 13:49:46.328886 6078 status_update_manager.cpp:167] New master detected at [email protected]:5050
I1015 13:49:46.328897 6085 slave.cpp:636] Detecting new master
I1015 13:49:46.662595 6085 slave.cpp:816] Re-registered with master [email protected]:5050
W1015 13:50:19.134799 6078 slave.cpp:791] Already registered with master [email protected]:5050
I1015 13:50:46.338639 6082 slave.cpp:3053] Current usage 59.91%. Max allowed age: 2.106364690479491days
W1015 13:51:07.704756 6082 slave.cpp:791] Already registered with master [email protected]:5050
W1015 13:51:15.611064 6078 slave.cpp:791] Already registered with master [email protected]:5050
W1015 13:51:18.703999 6082 slave.cpp:791] Already registered with master [email protected]:5050
W1015 13:51:21.911741 6079 slave.cpp:791] Already registered with master [email protected]:5050
You're using local ip adresses:
I1015 13:49:46.324472 6084 slave.cpp:167] Slave started on 1)@127.0.1.1:5051
try setting those to the appropriate ips, it might be that they cannot talk to each other properly
couple of places to look (I use the mesosphere google deploy):
Slave (some need master IP, some slave IP):
/etc/mesos-slave/hostname
/etc/mesos-slave/attributes/host
/etc/mesos/zk
/etc/hadoop/conf/core-site.xml
/etc/hadoop/conf/mapred-site.xml
Hope it helps!