Search code examples
h2o

H2O Hadoop requires access to user hdfs's HDFS home folder?


Running h2o (http://h2o-release.s3.amazonaws.com/h2o/rel-yau/5/h2o-3.26.0.5-hdp3.1.zip) on hdp 3.1.4 getting error at startup due to access restrictions to the hdfs:///user/hdfs folder

[root@HW005 h2o-3.26.0.5-hdp3.1]# hadoop jar h2odriver.jar -nodes 4 -mapperXmx 6g
Determining driver host interface for mapper->driver callback...
    [Possible callback IP address: 172.18.4.83]
    [Possible callback IP address: 127.0.0.1]
Using mapper->driver callback IP address and port: 172.18.4.83:37342
(You can override these with -driverif and -driverport/-driverportrange and/or specify external IP using -extdriverif.)
Memory Settings:
    mapreduce.map.java.opts:     -Xms6g -Xmx6g -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Dlog4j.defaultInitOverride=true
    Extra memory percent:        10
    mapreduce.map.memory.mb:     6758
Hive driver not present, not generating token.
19/09/17 10:38:17 INFO client.RMProxy: Connecting to ResourceManager at hw001.co.local/172.18.4.46:8050
19/09/17 10:38:17 INFO client.AHSProxy: Connecting to Application History server at hw002.co.local/172.18.4.47:10200
ERROR: Permission denied: user=root, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x

Seems odd that this would be a requirement, since I would like to run h2o as various different users depending on use case and I don't think it would be right to just give access the the hdfs user's (the HDP default HDFS admin user) HDFS home folder in order to do this. Can anyone explain what is going on here and how it would normally be handled?


Solution

  • How to manage impersonation in a non-kerberized Hadoop cluster, for...
    * creating the HDFS HomeDir for an arbitrary Hadoop user
    * running a job under that user (that will use the HomeDir for temp files)

    ## create HDFS HomeDir for new user, with "hdfs" privileged account
    ## note the workaround for the bug in "chmod" parser which
    ##  fails on "=<nothing>" in most Hadoop versions
    export HADOOP_USER_NAME=hdfs
    hdfs dfs -mkdir -p               /user/zorro
    hdfs dfs -chown zorro:zorro      /user/zorro
    hdfs dfs -chmod u=rwx,g=rx,o-rwx /user/zorro
    unset HADOOP_USER_NAME
    
    export HADOOP_USER_NAME=zorro
    # just to check who's there
    hdfs groups
    
    run-my-H2O-job-on-command-line
    unset HADOOP_USER_NAME