Say I have a Hive query like so:
CREATE TABLE student (key string, name string, course struct<class_name:string, class_teacher:string>)
STORED AS ORC;
Because of the ORC file format, this will create 5 different columns:
| key | name | course | course.class_name | course.class_teacher |
When attempting to read only the course.class_name
column, will both subcolumns of course
be read anyway, every time? As in, both course.class_name
and course.class_teacher
? As far as I am aware, ORC file format allows Hive to load only the columns needed for the query. So what will it actually do?
Your understanding is right. It will read the only column which is specified in the select query.
If select query is :
Select course.class_name from student
It will return only class_name.
In case if you want to query all the columns for the struct, you can use following.
select key, name, c.* from students
lateral view inline (array(course)) c