I’m running a bash script on AWS EMR that does something like:
for i in (‘tab1’ ‘tab2’ ‘tab3’ ‘tab4’)
do
nrow=$(hive -e “select count(*) from $i”)
done
This takes time as for each count a new hive session have to be setup. Is there a way to keep the session open throughout the loop?
Do all counts in a single statement. You can also generate the SQL statement instead of hardcoding.
Something like this:
output=$(hive -S -e "select 'tab1', count(*) from tab1
union all
select 'tab2', count(*) from tab2
union all
select 'tab3', count(*) from tab3")
echo "$output" | while read TABLE_NAME COUNT
do
echo "$TABLE_NAME $COUNT"
done