From Hive's docs:
If the table or partition contains many small RCFiles or ORC files, then the above command will merge them into larger files. In case of RCFile the merge happens at block level whereas for ORC files the merge happens at stripe level thereby avoiding the overhead of decompressing and decoding the data.
My question is: What is the difference between a block and a stripe?
HDFS blocks is the lowest level, ORC stripe is upper level, these levels are completely independent, stripes in ORC do not care about lower storage layer.
HDFS blocks:
ORC stripes:
upper level of storage. Stripe does know nothing about blocks.
ORC is splittable on stripe level. HDFS knows nothing about ORC structure and how it can be splitted for processing. HDFS splits files in blocks to optimize storage. Minimum one stripe can be processed in single container. You can configure stripe size to fit to the block size.
Some useful links. please read for better understanding:
Big ORC stripes and block padding in S3 - very useful blog