On IBM DSX I have the following problem.
For the Spark 1.6 kernels on DSX it was/is necessary to create new SQLContext objects in order to avoid issues with the metastore_db
and HiveContext
: http://stackoverflow.com/questions/38117849/you-must-build-spark-with-hive-export-spark-hive-true/38118112#38118112
The following code snippets were implemented using Spark 1.6 and both run for Spark 2.0.2, but not for Spark 2.1:
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df = sqlContext.createDataFrame([(1, "a"), (2, "b"), (3, "c"), (4, "d")], ("k", "v"))
df.count()
and
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
properties= {
'jdbcurl': 'JDBCURL',
'user': 'USER',
'password': 'PASSWORD!'
}
data_df_1 = sqlContext.read.jdbc(properties['jdbcurl'], table='GOSALES.BRANCH', properties=properties)
data_df_1.head()
I get this error:
IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"
However, when I execute the same code a second time it works again.
Instead of creating a new SQLContext using SQLContext(sc) you can use SQLContext.getOrCreate(sc). This will return an existing SQLContext if it exists.