Search code examples
apache-sparkpysparkapache-zeppelin

Zeppelin PySpark: 'JavaMember' object has no attribute 'parseDataType'


This simple PySpark snippet runs fine with normal spark-submit but fails with Apache Zeppelin on the cast call. Any ideas?

%pyspark
import pyspark.sql.functions as spark_functions

col1 = spark_functions.lit(None)
print("type(col1)={}".format(type(col1)))
col2 = col1.cast(StringType())

error is:

Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-6046223946582899049.py", line 252, in <module>
    eval(compiledCode)
  File "<string>", line 14, in <module>
  File "/usr/lib/spark/python/pyspark/sql/column.py", line 334, in cast
    jdt = ctx._ssql_ctx.parseDataType(dataType.json())
AttributeError: 'JavaMember' object has no attribute 'parseDataType'

Solution

  • This is a known bug with Spark 2.0 on Zeppelin 0.6.1 that is targeted to be fixed in Zeppelin 0.6.2: https://issues.apache.org/jira/browse/ZEPPELIN-1411