Search code examples
hadoophiveapache-spark-sqlhdfsorc

On HDFS, I want to display normal text for a hive table stored in ORC format


I have saved json dataframe in Hive using orc format

jsonDF.write.format("orc").saveAsTable(hiveExamples.jsonTest)

Now I need to display the file as a normal text on HDFS. Is there away to do this?

I have used hdfs dfs -text /path-of-table, but it displays the data in ORC format.


Solution

  • From the linux shell command there is an utility called hive --orcfiledump

    To see the metadata of an ORC file in HDFS you can invoke the command like:

    [@localhost ~ ]$ hive --orcfiledump <path to HDFS ORC file>;
    

    To see the content of your ORC file in plain text you can invoke the command with -d option:

    [@localhost ~ ]$ hive --orcfiledump -d <path to HDFS ORC file>;
    

    As an example:

    hive> DESCRIBE FORMATTED orders_orc;
    Location:  hdfs://localhost:8020/user/hive/warehouse/training_retail.db/orders_orc
    # Storage Information        
    SerDe Library:          org.apache.hadoop.hive.ql.io.orc.OrcSerde    
    InputFormat:            org.apache.hadoop.hive.ql.io.orc.OrcInputFormat  
    OutputFormat:           org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
    
    hive> exit;
    
    [@localhost ~ ]$ hdfs dfs -ls /user/hive/warehouse/training_retail.db/orders_orc
    Found 1 items
    -rwxrwxrwx   1 training hive     163094 2020-04-20 09:39 /user/hive/warehouse/training_retail.db/orders_orc/000000_0
    
    [@localhost ~ ]$ hdfs dfs -tail /user/hive/warehouse/training_retail.db/orders_orc/000000_0
    ��+"%ў�.�b.����8V$tߗ��\|�?�xM��
                          *�ڣ�������!�2���_���Ͳ�V���
                                                         r�E(����~�uM�/&��&x=-�&2�T��o��JD���Q��m5��#���8Iqe����A�^�ێ"���@�t�w�m�A ���3|�����NL�Q����p�d�#:}S-D�Wq�_"����
    
    [@localhost ~ ]$ hive --orcfiledump /user/hive/warehouse/training_retail.db/orders_orc/000000_0;
    Structure for /user/hive/warehouse/training_retail.db/orders_orc/000000_0
    File Version: 0.12 with HIVE_8732
    20/04/20 10:19:58 INFO orc.ReaderImpl: Reading ORC rows from /user/hive/warehouse/training_retail.db/orders_orc/000000_0 with {include: null, offset: 0, length: 9223372036854775807}
    Rows: 68883
    Compression: ZLIB
    Compression size: 262144
    Type: struct<_col0:int,_col1:string,_col2:int,_col3:string>
    ....
    File length: 163094 bytes
    Padding length: 0 bytes
    Padding ratio: 0%
    
    [@localhost ~ ]$ hive --orcfiledump -d /user/hive/warehouse/training_retail.db/orders_orc/000000_0 | head -n 5
    {"_col0":1,"_col1":"2013-07-25 00:00:00.0","_col2":11599,"_col3":"CLOSED"}
    {"_col0":2,"_col1":"2013-07-25 00:00:00.0","_col2":256,"_col3":"PENDING_PAYMENT"}
    {"_col0":3,"_col1":"2013-07-25 00:00:00.0","_col2":12111,"_col3":"COMPLETE"}
    {"_col0":4,"_col1":"2013-07-25 00:00:00.0","_col2":8827,"_col3":"CLOSED"}
    {"_col0":5,"_col1":"2013-07-25 00:00:00.0","_col2":11318,"_col3":"COMPLETE"}
         
    

    You can follow this link for details:

    How to see contents of Hive orc files in linux