Search code examples
hadoophivehdfshue

Run a script within a script? - Hive (and other QL's)


Is it possible to call a script and run it before running the rest of a script?

My goal is to perform a set-up script which will download and organize the data necessary to perform my main query.

I am looking for something like:

create table logcontent (content string) row format delimited fields terminated by '\n';

**call secondary hive script with date-range arguments and download necessary logs into <logcontent>**

**perform the rest of the query**

I want to do this in order to create a nice abstraction for the table setup so that the end-user does't have to worry about table set-up, it will be done for them.

I know that AWS has the option to add a Hive script as a step in the job but how can I do the same thing locally? Is this possible? If so, what is the syntax? If not, what are some work-arounds?


Solution

  • The answer is to organize your main shell script in a similar template as below.

    ## Content of main.sh
    
    ## Code block to setup Hadoop Environment and config in Path, if not already exist.
    
    ## Step 1> Create the hive table in non-interactive mode.
    hive -e "create table test(id int, name string) row format delimited fields terminated by '\n'"
    # Check if the command is successful. IF else logic can be added.
    echo $? 
    
    ## Step 2> Call the secondary script executable to download logs
    ksh downloadlogs.sh # Assuming the download script could be invoked this way.
    
    ## Step 3> Execute rest of the hive queries to organize data
    hive -e "select * from test"