Search code examples
hiveparquet

Hive table with only a subset of fields from parquet file


I am creating a Hive table like so:

Create external table test as (
Col1 string,
Col2 string)
Stored as parquet ‘/file.parquet’

My question is if the parquet file has 100 fields and i need my table to only use 5 of them, can i just use those 5 column names in the table definition or I need to do something diffErent?


Solution

  • Yes this will work. You can create the external table with the required columns. I tested this by writing a parquet file comprising of 6 columns to an external path and then creating an external table with 3 columns on top of it. Post that, querying the table yielded only 3 columns.

    Note: If you want to access all the columns via spark it is possible ,read from the external file path.