I have a hive script that has some hive conf variables along the top. This query works fine when I run it on our emr cluster, expected data are returned. E.g.
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions=10000;
set mapreduce.map.memory.mb=7168;
set mapreduce.reduce.memory.mb=7168;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.compress.output=true;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
set hive.execution.engine=mr;
select
fruits,
count(1) as n
from table
group by fruits;
I would like to run this query on another server that has a odbc connection with hive.
(I'm in r)
hive_conn <- DBI::dbConnect(odbc(), dsn = "Hive")
results <- DBI::dbGetQuery(hive_conn, "select fruits, count(1) as n from table group by fruits")
This runs fine and returns a data frame as expected.
However, if I want to set some hive configurations, I do not know how to send those with odbc.
How can I tell hive via odbc to run my query with my chosen hive conf settings?
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions=10000;
set mapreduce.map.memory.mb=7168;
set mapreduce.reduce.memory.mb=7168;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.compress.output=true;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
set hive.execution.engine=mr;
I found the solution to this in the documentation for for the driver: https://www.simba.com/products/Hive/doc/ODBC_InstallGuide/linux/content/odbc/hi/configuring/serverside.htm
I needed to add these 'server side properties' when I create the connection. You prepend with the string 'SSP_' (server side property) and then add them as name value pairs e.g:
hive_conn <- dbConnect(odbc(),
dsn = "Hive",
SSP_hive.execution.engine = "mr",
SSP_hive.exec.dynamic.partition.mode = "nonstrict",
SSP_hive.exec.dynamic.partition = "true",
SSP_hive.exec.max.dynamic.partitions = 10000,
SSP_mapreduce.map.memory.mb = 7168,
SSP_mapreduce.reduce.memory.mb = 7168,
SSP_hive.exec.max.dynamic.partitions.pernode = 10000,
SSP_hive.exec.compress.output = "true",
SSP_mapred.output.compression.codec = "org.apache.hadoop.io.compress.SnappyCodec"
)