Search code examples
hiveooziehue

Is there a way to pass multiple values of the same variable into a Hive job in Hue?



I have a Hive query in Hue with one input variable, a string (for example a date like '20160117').
I'd like to execute this Hive query in Hue and pass it multiple values for that single variable.
Is it possible? If yes, how would you guys do it?


Solution

  • Oozie runs Direct Acyclic Graphs (DAG). And Acyclic comes down to no loop, ever. But of course there are workarounds.

    So, if you must run the same HQL script exactly N times with a different parameter value...

    • either copy/paste the Hive Action N times, in a chain, with a different param value (quick and dirty)
    • or build a Sub-Workflow with just the Hive action and call it N times, in a chain, with a different param value

    On the other hand, if you must adapt dynamically the number and the value of executions, then you must work out the "loop" logic outside of Oozie proper...

    • for instance, start with a Shell action that creates an empty HQL file, then adds N queries in a loop, then uploads the file to HDFS; next, a Hive action that executes the HQL script as-is (quick and dirty, but not ideal for exception handling)
    • or develop a Java program that connects to HiveServer2 via JDBC, submits a PreparedStatement with 1 bind variable, and executes the statement N times in a loop with different values of the variable.

    And maybe, someday, Hive will support some kind of procedural language similar to PL/SQL, T-SQL, PgSQL etc. and you will be able to pass a comma-separated list of values and process it inside of Hive.