Search code examples
mapr

Cannot access HDFS from MapR Data Science Refinery docker container


Trying to follow this article (https://mapr.com/blog/how-to-run-data-science-refinery-from-an-edge-node/) for setting up DSR docker image (tag: v1.1_6.0.0_4.1.0_centos7) on an edge node (see here for how to set some of the env.list values: https://mapr.com/docs/61/AdvancedInstallation/Env_Variables_Installer_Container.html). However, once the container is started,

docker run --rm -it --env-file ./mapr-docker-env.list
--cap-add SYS_ADMIN --cap-add SYS_RESOURCE --device /dev/fuse -p 9995:9995
-p 10000-10010:10000-10010 -v /tmp/maprticket_10003:/tmp/dsr_ticket:ro -v
/sys/fs/cgroup:/sys/fs/cgroup:ro docker.io/maprtech/data-science-refinery
Container timezone will be set from value passed in MAPR_TZ:
....
....
....
opt/mapr/lib/baseutils*.jar:/opt/mapr/lib/maprutil*.jar:/opt/mapr/lib/json-1.8.jar:/opt/mapr/lib/flexjson-2.1.jar
org.apache.livy.server.LivyServer, logging to
/opt/mapr/livy/livy-0.5.0/logs/livy-myuser-server.out
Log dir doesn't exist, create /opt/mapr/zeppelin/zeppelin-0.8.0/logs
Zeppelin start                                             [  OK  ]

unable to access the MapR HDFS from the container as expected. Ie. running

ls -lha /mapr/ourcluster.name.local/

from within the container, shows that the location does not exist. Yet, checking the maprticket expiration time on the host machine maprlogin print shows that the ticket is still valid, can be used to access the HDFS from the host (eg. hadoop fs -ls /), and is written correctly to the env.list file. Does anyone else using this docker image know what's happening here?


Solution

  • The DSR image seems to have a bug where, even though the mapr SASL ticket specified in the env.list file exists and is valid, it is not getting copied to the container when it is started. Thus, the container can't connect to the MapR HDFS. To fix this, did...

    1. Created a file of the same name as the ticket file in the same location on the container as specified in the env.list value specifying where the ticketfile would be
    2. Manually copy-pasted the contents of the ticketfile from the host to the ticket file we just created in the docker container
    3. (After waiting a bit (~2 minutes)) Restarted the mapr posix service: sudo service mapr-posix-client-container restart

    After doing this, the container appears to be able to access the HDFS (and submit YARN jobs) fine.

    (If anyone has any more information on why this could be happening or if there is a better workaround to get the container working as expected, please let me know).