Search code examples
csvorc

How can I convert local ORC files to CSV?


I have an ORC file on my local machine and I need any reasonable format from it (e.g. CSV, JSON, YAML, ...).

How can I convert ORC to CSV?


Solution

    1. Download
    2. Extract the files, go to the java folder and execute maven: mvn install
    3. Use ORC-Tools

    This is how I use them - you will likely need to adjust the paths:

    java -jar ~/.m2/repository/org/apache/orc/orc-tools/1.5.4/orc-tools-1.5.4-uber.jar data ~/your_file.orc > output.json
    

    The output is JSON Lines which is easy to convert to CSV. First I needed to remove the last two lines from the output. Then:

    import pandas as pd
    
    df = pd.read_json('output.json', lines=True)
    df.to_csv('output.csv')