Search code examples
hiveprestoorcsnappy

Presto query error on hive ORC, Can not read SQL type real from ORC stream of type DOUBLE


I was able to run query in presto to read the non-float columns from Hive ORC(snappy) table. However, when I select all float datatype columns through the presto cli, gives me the below error message. Any suggestions what is the alternative other than changing the filed type to double in the targetHive table

presto:sample> select * from emp_detail;

Query 20200107_112537_00009_2zpay failed: Error opening Hive split hdfs://ip_address/warehouse/tablespace/managed/hive/sample.db/emp_detail/part-00079-5b0c6005-0943-4181-951f-43bcfcfe741f-c000.snappy.orc (offset=0, length=1999857): Malformed ORC file. Can not read SQL type real from ORC stream .salary of type DOUBLE [hdfs://ip_address/warehouse/tablespace/managed/hive/sample.db/emp_detail/part-00079-5b0c6005-0943-4181-951f-43bcfcfe741f-c000.snappy.orc]


Solution

  • Please try to add this property

    hive.orc.use-column-names=true
    

    to presto-server/conf/catalog/hive.properties, and restart your presto server.

    To test it without restarting the server run this from presto-cli

    SET SESSION hive.orc_use_column_names=true;
    

    Release notes from Presto regarding these attribute.