Search code examples
amazon-web-serviceshdfsmesosmesospheredcos

Can't access HDFS on Mesosphere DC/OS despite "healthy" status


So I've deployed a Mesos cluster in AWS using the CloudFormation script / instructions found here with the default cluster settings (5 private slaves, one public slave, single master, all m3.xlarge), and installed HDFS on the cluster with the dcos command: dcos package install hdfs.

The HDFS service is apparently up and healthy according to the DC/OS web UI and Marathon: Mesos UI showing running HDFS

(the problem) At this point I should be able to SSH into my slave nodes and execute hadoop fs commands, but that returns the error -bash: hadoop: command not found (basically telling me there is no hadoop installed here).

There are no errors coming from the STDOUT and STDERR logging for the HDFS service, but for what its worth there is a recurring "offer decline" message appearing in the logs:

Processing DECLINE call for offers: [ 5358a8d8-74b4-4f33-9418-b76578d6c82b-O8390 ] for framework 5358a8d8-74b4-4f33-9418-b76578d6c82b-0001 (hdfs) at scheduler-60fe6c75-9288-49bc-9180-f7a271c …

I'm sure I'm missing something silly.


Solution

  • So I figured out a solution to at least verifying HDFS is running on your Mesos DC/OS cluster after install.

    1. SSH into your master with the dcos CLI: dcos node ssh --master-proxy --leader
    2. Create a docker container with hadoop installed to query your HDFS: docker run -ti cloudera/quickstart hadoop fs -ls hdfs://namenode-0.hdfs.mesos:9001/

    Why this isn't a good solution & what to look out for:

    1. Previous documentation all points to a default URL of hdfs://hdfs/, which instead will throw a java.net.UnknownHostException. I don't like pointing directly to a namenode.
    2. Other documentation suggests you can run hdfs fs ... commands when you SSH into your cluster - this does not work as documented.
    3. The image I used just to test that you can access HDFS is > 4GB (better options?)
    4. None of this is documented (or at least not clearly/completely, hence why I'm keeping this post updated). I had to dig through DC/OS slack chat to find an answer.
    5. The Mesosphere/HDFS repo is a completely different version than the HDFS that is installed via dcos package install hdfs. That repo is no longer maintained and the new version isn't open sourced yet (hence the lack of current documentation I guess).

    I'm hoping there is an easier way to interface with HDFS that I'm still missing. Any better solutions would still be very helpful!