I'm intermittently getting the error message
DAG did not succeed due to VERTEX_FAILURE.
when running Hive queries via PyHive. Hive is running on an EMR cluster where hive.vectorized.execution.enabled
is set to false
in the hive-site.xml file for this reason.
I can set the above property through the configuration on the Hive connection and my query has run successfully every time I've executed it, however I want to confirm that this has fixed the issue and that it is definitely the case that hive-site.xml is being ignored.
Can anyone confirm if this is the expected behavior, or alternatively is there any way to inspect the Hive configuration via PyHive as I've not been able to find any way of doing this?
Thanks!
PyHive
is a thin client that connects to HiveServer2, just like a Java or C client (via JDBC or ODBC). It does not use any Hadoop configuration files on your local machine. The HS2 session starts with whatever properties are set server-side.
Same goes for ImPyla
BTW.
So it's your responsibility to set custom session properties from your Python code, e.g. execute this statement...
SET hive.vectorized.execution.enabled =False
... before running your SELECT
.