Search code examples
hadooppermissionshdfsaclbigdata

How granular is access control on HDFS for unstructured data?


I am looking for any piece of technical paper explaining how access control is conducted on unstructured data ingested by HDFS.

  1. Can the granularity level be smaller than POSIX-ish file permissions?

  2. Similarly, how would products like RecordService (from Cloudera), which provide an abstraction layer for security on storage components, work on unstructured data?


For instance, if I have a very big emails archive file (more than a terabyte), would I be able to specify a more fine-grained ACL than one on the entire file itself? I am thinking about email headers, etc.


Solution

    1. The granularity supported is to the row and column levels. See details.
    2. Presently, for RecordService to work, your data must be organized as Hive Metastore tables. In the future, RecordService may infer structure/schema from the files themselves (but, not the case today).