Search code examples
apache-sparkpysparkdata-science-experience

New SQLContext: Spark 1.6 backward-compatibility with Spark 2.1


On IBM DSX I have the following problem.

For the Spark 1.6 kernels on DSX it was/is necessary to create new SQLContext objects in order to avoid issues with the metastore_db and HiveContext : http://stackoverflow.com/questions/38117849/you-must-build-spark-with-hive-export-spark-hive-true/38118112#38118112

The following code snippets were implemented using Spark 1.6 and both run for Spark 2.0.2, but not for Spark 2.1:

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

df = sqlContext.createDataFrame([(1, "a"), (2, "b"), (3, "c"), (4, "d")], ("k", "v"))
df.count()

and

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

properties= {
    'jdbcurl': 'JDBCURL',
    'user': 'USER',
    'password': 'PASSWORD!'
}

data_df_1 = sqlContext.read.jdbc(properties['jdbcurl'], table='GOSALES.BRANCH', properties=properties)
data_df_1.head()

I get this error:

IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"

However, when I execute the same code a second time it works again.


Solution

  • Instead of creating a new SQLContext using SQLContext(sc) you can use SQLContext.getOrCreate(sc). This will return an existing SQLContext if it exists.