Search code examples
hdfshdpapache-ranger

Ranger policies don't work for HDFS NFS access


I have a ranger policy for a HDFS resource that looks like... enter image description here Now trying to access that HDFS path via hadoop fs <path to the hdfs location> as two different users:

# as an unauthorized user
[ml1@HW04 ml1c]$ hadoop fs -ls <path to the hdfs location>
ls: Permission denied: user=ml1, access=EXECUTE, inode="<path to the hdfs location>"

# as an authorized user
[hph_etl@HW04 hph_etl]$ hadoop fs -ls <path to the hdfs location>
Found 4 items
drwxrwxr-x   - hph_etl hph_etl          0 2019-07-31 15:13 <path to the hdfs location>
drwxrwxr-x   - hph_etl hph_etl          0 2019-08-07 10:52 <path to the hdfs location>
drwxrwxr-x   - hph_etl hph_etl          0 2019-07-31 14:28 <path to the hdfs location>
drwxrwxr-x   - hph_etl hph_etl          0 2019-07-26 16:12 <path to the hdfs location>

which works as expected. Now trying via ls -lh <nfs path to the hdfs location> on the local file system:

# as an unauthorized user
[ml1@HW04 ml1c]$ ls -lh <nfs path to the hdfs location>
total 2.0K
drwxrwxr-x. 4 hph_etl hph_etl 128 Jul 31 15:13 export
drwxrwxr-x. 5 hph_etl hph_etl 160 Aug  7 10:52 import
drwxrwxr-x. 5 hph_etl hph_etl 160 Jul 31 14:28 storage
drwxrwxr-x. 3 hph_etl hph_etl  96 Jul 26 16:12 tests

# as an authorized user
[hph_etl@HW04 hph_etl]$ ls -lh <nfs path to the hdfs location>
total 2.0K
drwxrwxr-x. 4 hph_etl hph_etl 128 Jul 31 15:13 export
drwxrwxr-x. 5 hph_etl hph_etl 160 Aug  7 10:52 import
drwxrwxr-x. 5 hph_etl hph_etl 160 Jul 31 14:28 storage
drwxrwxr-x. 3 hph_etl hph_etl  96 Jul 26 16:12 tests

we see both users were able to access the HDFS location when doing so via NFS (even though only the hph_etl user should have been able to). Anyone know what's going on here? Any debugging tips or fixes?

UPDATE:

Apparently, this is not unexpected behavior. Talking with people from Hortonworks, the intent is to...

  • mount specific section of HDFS to machines via NFS with permissions based on POSIX restrictions
  • then have NiFi (eg. from HDF) constantly listening to those locations to then load data into some other Ranger-protected location in HDFS

To me this seems like a security concern, given that I can easily do something like this

$ cd /hdfs_nfs_mount/some/private/location
$ head some_private_file.txt
<shows all the contents>

# even when Ranger would rather this user not go there...
$ whoami
<some unauthorized user>
$ hadoop fs -ls /some/private/location
ls: Permission denied: user=<some unauthorized user>, access=EXECUTE, inode="/some/private/location"

if on a regular cluster node that just has all of the HDFS mounted to the server at the HDFS root. Not writing this as an answer because kindof hoping that this is not the answer; will continue looking.


Solution

  • Apparently, this is not unexpected behavior. Talking with people from Hortonworks, the intent is to...

    mount specific section of HDFS to machines via NFS with permissions based on POSIX restrictions then have NiFi (eg. from HDF) constantly listening to those locations to then load data into some other Ranger-protected location in HDFS To me this seems like a security concern, given that I can easily do something like this

    $ cd /hdfs_nfs_mount/some/private/location
    $ head some_private_file.txt
    <shows all the contents>
    
    # even when Ranger would rather this user not go there...
    $ whoami
    <some unauthorized user>
    $ hadoop fs -ls /some/private/location
    ls: Permission denied: user=<some unauthorized user>, access=EXECUTE, inode="/some/private/location"
    

    if on a regular cluster node that just has all of the HDFS mounted to the server at the HDFS root.

    It seems that the conventional way that NFS is used is to...

    • Have HDFS NFS gateway mounted on an edge cluster node
    • Mount this NFS to a client machine (eg. via samba) with write only, POSIX permissions (Apache Ranger simply cannot help here)
    • Use SSSD (basically can be used to link unix creds to active directory creds) on the edge node and the natural SID or using Active Directory on the client node (assuming a Windows machine here) to access the mounted NFS share on the client machine
    • Setup a NiFi (or other ETL) process to detect data placed in this share and bring it into specified HDFS locations (it would be at this point that Ranger policies would be able to be enforced)

    Thus, HDFS NFS Gateway is not ideal for reading files or browsing the HDFS. For that it is recommended to use create user accounts in Ambari for the various cluster users and give them access to FileViews for browsing and downloading files (which would be protected by Ranger policies).