Search code examples
javahadoophiveorc

How to append ORC file


We have a requirement where we need to appednd ORC files. I tried to google it but no result. Also org.apache.hadoop.hive.ql.io.orc.WriterImpl of ORC do not have the append API. Is there anyway to append the ORC files? (More specifically using JAVA)


Solution

  • ORC data files are subdivised in independent stripes; each stripe be created in a single atomic step. See the official documentation for details.

    I don't believe you can directly append to an existing file on-the-fly. That would mean leaving a corrupt stripe (hence a corrupt file) in case of a job crash while writing.

    But you can

    • create a new ORC data file (which will contain 1..N stripes depending on actual data volume vs. orc.stripe.size property) per reducer
    • then "concatenate" these data files -- and existing file(s) -- using Hive V0.14 and above