Search code examples
hadoophivebigdatahiveqlorc

Will all nested columns be read in an ORC file?


Say I have a Hive query like so:

CREATE TABLE student (key string, name string, course struct<class_name:string, class_teacher:string>) 
STORED AS ORC;

Because of the ORC file format, this will create 5 different columns:

| key | name | course | course.class_name | course.class_teacher |

When attempting to read only the course.class_name column, will both subcolumns of course be read anyway, every time? As in, both course.class_name and course.class_teacher ? As far as I am aware, ORC file format allows Hive to load only the columns needed for the query. So what will it actually do?


Solution

  • Your understanding is right. It will read the only column which is specified in the select query.
    If select query is :

    Select course.class_name from student
    

    It will return only class_name.
    In case if you want to query all the columns for the struct, you can use following.

    select key, name, c.* from students 
       lateral view inline (array(course)) c