Run a script within a script? - Hive (and other QL's)

Is it possible to call a script and run it before running the rest of a script?

My goal is to perform a set-up script which will download and organize the data necessary to perform my main query.

I am looking for something like:

create table logcontent (content string) row format delimited fields terminated by '\n';

**call secondary hive script with date-range arguments and download necessary logs into <logcontent>**

**perform the rest of the query**

I want to do this in order to create a nice abstraction for the table setup so that the end-user does't have to worry about table set-up, it will be done for them.

I know that AWS has the option to add a Hive script as a step in the job but how can I do the same thing locally? Is this possible? If so, what is the syntax? If not, what are some work-arounds?

Solution

The answer is to organize your main shell script in a similar template as below.

## Content of main.sh

## Code block to setup Hadoop Environment and config in Path, if not already exist.

## Step 1> Create the hive table in non-interactive mode.
hive -e "create table test(id int, name string) row format delimited fields terminated by '\n'"
# Check if the command is successful. IF else logic can be added.
echo $? 

## Step 2> Call the secondary script executable to download logs
ksh downloadlogs.sh # Assuming the download script could be invoked this way.

## Step 3> Execute rest of the hive queries to organize data
hive -e "select * from test"