Search code examples
hadoophivekerberoshdp

Why would I Kerberise my hadoop (HDP) cluster if it already uses AD/LDAP?


I have a HDP cluster.

This cluster is configured to use Active Directory as Authentication and Authorization authority. To be more specific, we use Ranger to limit accesses to HDFS directories, Hive tables and Yarn queues after said user provided correct username/password combinaison.

I have been tasked to Kerberise the Cluster, which is very easy thanks to the "press buttons and skip" like option in Ambari.

We Kerberised a test cluster. While interacting with Hive does not require any modification on our existing scripts on the cluster's machines, it is very, very difficult to find a way for end users to interact with Hive from OUTSIDE the cluster (PowerBI, DbVisualizer, PHP application).

Kerberising seems to bring an unnecessary amount of work.

What concret benefits would I get from Kerberising the cluster (except make the guys above in the hierachy happy because, hey, we Kerberised, yoohoo) ?

Edit:

One benefit:

  • Kerberising the Cluster grants more security as it is running on linux machines, but the company Active Directory is not able to handle such OS.

Solution

  • Ranger with AD/LDAP authentication and authorization is ok for external users, but AFAIK, it will not secure machine-to-machine or command-line interactions.

    I'm not sure if it still applies, but on a Cloudera cluster without Kerberos, you could fake a login by setting an environment parameter HADOOP_USER_NAME on the command line:

    sh-4.1$ whoami
    ali
    sh-4.1$ hadoop fs -ls /tmp/hive/zeppelin
    ls: Permission denied: user=ali, access=READ_EXECUTE, inode="/tmp/hive/zeppelin":zeppelin:hdfs:drwx------
    sh-4.1$ export HADOOP_USER_NAME=hdfs
    sh-4.1$ hadoop fs -ls /tmp/hive/zeppelin
    Found 4 items
    drwx------   - zeppelin hdfs          0 2015-09-26 17:51 /tmp/hive/zeppelin/037f5062-56ba-4efc-b438-6f349cab51e4
    

    For machine-to-machine communications, tools like Storm, Kafka, Solr or Spark are not secured by Ranger, but they are secured by Kerberos, so only dedicated processes can use those services.

    Source: https://community.cloudera.com/t5/Support-Questions/Kerberos-AD-LDAP-and-Ranger/td-p/96755

    Update: Apparently, Kafka and Solr Integration has been implemented in Ranger since then.