Search code examples
sqlhivereusability

How can I gain more composability / reusability in my Hive SQL scripts?


I'm developing several Hive-based ETLs, and have seriously violated the DRY (Don't Repeat Yourself) principle.

How can I gain more composability / reusability in my Hive SQL scripts?

Basically I'm looking for an INCLUDE or IMPORT command.

Any tips? For folks that write a ton of HiveQL scripts, how do you not repeat yourself?

Thanks in advance.


Solution

  • You just need to have your reusable hive script into small .HQL files. Create shell script for ordered execution of the .HQL files.

    All you need to make sure that the current executing script do have its pre-requisite completed. i.e. it should have expected table/schema.

    There are two things, which would help you.

    1) the hive -f abcd.hql command : Use a series of this command in the shell script for ordered execution of the scripts. e.g.

    hive -f specific.hql
    wait $! 
    hive -f reusable1.hql
    wait $! 
    hive -f specific2.hql
    wait $! 
    hive -f specific3.hql
    wait $! 
    hive -f reusable2.hql
    

    You might note the wait commands, basically to tell the shell to wait for last command to complete. This is very important when output table of the last script is input to the next one.

    2) Use hiveconf to customize table, schema (or even where conditions or column names) at runtime. e.g. create table '${hiveconf:schema.name}.my_table ...