Search code examples
hdfsparquetimpala

Querying Parquet file in HDFS using Impala


I'm trying to read a parquet file with Impala.

impala-shell> SELECT * FROM `/path/in/hdfs/*.parquet`

I know I can do that using Spark or Drill, but I wonder if it's possible with Impala ?

Thanks


Solution

  • You would need to create a structured table on top of the parquet files to query via Impala.

    General example of external table pointing to parquet directory ... Cloudera docs provide all methods here:

    https://www.cloudera.com/documentation/enterprise/latest/topics/impala_parquet.html#parquet_ddl

    CREATE EXTERNAL TABLE ingest_existing_files LIKE PARQUET '/user/etl/destination/datafile1.dat'
      STORED AS PARQUET
      LOCATION '/user/etl/destination';