Search code examples
apache-sparkdb2pysparkbigsql

java.lang.ClassNotFoundException: com.ibm.db2.jcc.DB2Driver exception for connecting BigSQL using Python


I'm new to pyspark.I'm using python 3.5 & spark2.2.0 on my Ubuntu 16.0. I wrote following code to connect BigSQL using pyspark

from pyspark.sql.session import SparkSession
spark = SparkSession.builder.getOrCreate()

spark_train_df = spark.read.jdbc("jdbc:db2://my bigsq url :port number:sslConnection=true;sslTrustStoreLocation=ibm-truststore.jks;sslTrustStorePassword=*password123;","schema.Table Name",
             properties={"user": username, 
                      "password": password,
                      'driver' : 'com.ibm.db2.jcc.DB2Driver'}) # Trust store location is defined in .bashrc
spark_train_df.registerTempTable('data_table')

train_df = spark.sql('select * from data_table')

Also I have added my trust store & driver path in my .bashrc file But while running this code I'm getting error message

java.lang.ClassNotFoundException: com.ibm.db2.jcc.DB2Driver exception

Can you expert please guide me to solve this problem?


Solution

  • You need to add the DB2 JDBC jars in your spark-submit, i.e., for postgres

    spark-shell --master local[*] --packages  org.postgresql:postgresql:9.4.1207.jre7
    

    or (or DB2)

    spark-shell --master local[*] --jars /path/to/db2/jdbc/db2.jar