I am trying to execute a Hive query in my Spark code, but I need to use a jar library to perform this query with Hive because I had created the table with this jar, so to query the table I have to import it. My Spark code:
val hiveContext=...
hiveContext.sql("ADD JAR hive-jdbc-handler-2.3.4.jar")
hiveContext.sql("SELECT * FROM TABLE")
Following this previous question: How to add jar using HiveContext in the spark job I had added to my spark-submit the parameter:
--jar "LOCAL PATH to hive-jdbc-handler-2.3.4.jar"
In the logs of my application, I am getting following messages:
18/08/02 14:10:41,271 | INFO | 180802140805 | SessionState | Added [hive-jdbc-handler-2.3.4.jar] to class path
18/08/02 14:10:41,271 | INFO | 180802140805 | SessionState | Added resources: [hive-jdbc-handler-2.3.4.jar]
18/08/02 14:10:42,179 | ERROR | 180802140805 | org.apache.hive.storage.jdbc.dao.GenericJdbcDatabaseAccessor | Error while trying to get column names.
org.apache.commons.dbcp.SQLNestedException: Cannot load JDBC driver class 'org.postgresql.Driver'
Notice that I want to execute my application in a cluster. What could I do?
The way I am triying to add a jar to use it in Spark was correct (there is no need to use the method "addFile" in Cluster mode). The error that I was getting is due to the jar that I was using is corrupted; I replaced my jar for a new one and it worked.