Search code examples
hivelocalcreate-table

Hive: create table and write it locally at the same time


Is it possible in hive to create a table and have it saved locally at the same time?

When I get data for my analyses, I usually create temporary tables to track eventual mistakes in the queries/scripts. Some of these are just temporary tables, while others contain the data that I actually need for my analyses.

What I do usually is using hive -e "select * from db.table" > filename.tsv to get the data locally; however when the tables are big this can take quite some time.

I was wondering if there is some way in my script to create the table and save it locally at the same time. Probably this is not possible, but I thought it is worth asking.


Solution

  • Honestly doing it the way you are is the best way out of the two possible ways but it is worth noting you can preform a similar task in an .hql file for automation.

    Using syntax like this:

    INSERT OVERWRITE LOCAL DIRECTORY '/home/user/temp' select * from table;
    

    You can run a query and store it somewhere in the local directory (as long as there is enough space and correct privileges)

    A disadvantage to this is that with a pipe you get the data stored nicely as '|' delimitation and new line separated, but this method will store the values in the hive default '^b' I think.

    A work around is to do something like this:

    INSERT OVERWRITE LOCAL DIRECTORY '/home/user/temp' 
    ROW FORMAT DELIMITED 
    FIELDS TERMINATED BY ',' 
    select books from table;
    

    But this is only in Hive 0.11 or higher