Search code examples
hadoophivesqoophcatalog

Creating output in ORCFile format


I need to create output in ORCFile format. As per this page (http://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/) it is the best.

Questions?

1) What codec should I use to create files in ORCFile format? 2) Are the files created in this format readable by using -text option (e.g.

hadoop fs -cat -text /tmp/a.orc

3) Any other pointers? Is it too early to use this format? Pros & Cons?

Thanks.


Solution

  • To create data in ORCFile in Hive, just use the phrase "stored as orc" at the end of the table definition and load your data. You can also use Sqoop to import directly into ORC using the HCatalog import option.

    There is also a tool called orcfiledump that helps you analyze data stored as ORC, giving you a list of columns, types and statistics.

    You can't use -cat to read ORC directly but you can easily export ORC data to a CSV file.