Search code examples
hiveorc

Should ORC files pointed by a Hive table (orc type) contain all the attributes in the hive table?


I have a hive table that points to a s3 path (s3:///table/data/) that contains multiple orc files. I have a job that writes files to above prefix, but, order of the attributes is not guaranteed (among the files) and not all attributes are populated i.e. some files could have only subset of data.

So, can hive table map the column data with appropriate column names and return the appropriate values for each column for a query?


Solution

  • No incase of ORC table just reads the data based on the order of column in the table.

    If order of attributes are not guaranteed then hive orc table reads the data according to the table schema if datatype is matching then displays the value (or) converts the value to that type (or) null.

    Probably you need to Create an AVRO table then based on the avro schema table will point the correct value.