Search code examples
javahadoopcryptographyhortonworks-data-platform

Why do we need Hadoop KMS?


I am not sure why do we need this Hadoop KMS exactly for ? I went through the official documentation of Apache Hadoop and there is not exactly mentioned why do we need this concept. The only thing which is clear with me is that using this client and server can share keys over http or https using REST APIs and then there are various ways of authentication. Is it from security perspective ?? Can somebody explain it to me in layman terms what exactly it is ??
Do correct me if I am wrong anywhere .


Solution

  • KMS is basically part of HDFS native data encryption utility, used for storing the encrypted key.You can now encrypt selected files or directories in HDFS, without any application code change.

    An HDFS administrator sets up encryption, and then HDFS takes care of the actual encryption or decryption without the end-user needing to manually encrypt or decrypt a file. The following terminology describes the key areas of transparent data encryption(TDE):

    Encryption Zone - An HDFS admin creates an encryption zone and then links it to an empty HDFS directory and an encryption key. Any files that are put in the directory are automatically encrypted by HDFS.

    Key Management Server (KMS) - The KMS is responsible for storing the encryption key. The KMS provides a REST API and access control on keys that are stored in the KMS.

    Key Provider API - The glue used by the HDFS Name Node and Client to connect with the Key Management Server.

    Reference: Enabling transparent data encryption