Search code examples
bashhivehiveqlamazon-emr

Keep Hive session open EMR


I’m running a bash script on AWS EMR that does something like:

for i in (‘tab1’ ‘tab2’ ‘tab3’ ‘tab4’)
  do
   nrow=$(hive -e “select count(*) from $i”) 
  done

This takes time as for each count a new hive session have to be setup. Is there a way to keep the session open throughout the loop?


Solution

  • Do all counts in a single statement. You can also generate the SQL statement instead of hardcoding.

    Something like this:

    output=$(hive -S -e "select 'tab1', count(*) from tab1
                         union all 
                         select 'tab2', count(*) from tab2
                         union all 
                         select 'tab3', count(*) from tab3")
    
    echo "$output" | while read TABLE_NAME COUNT
    do
    echo "$TABLE_NAME $COUNT"
    done