This simple PySpark snippet runs fine with normal spark-submit but fails with Apache Zeppelin on the cast
call. Any ideas?
%pyspark
import pyspark.sql.functions as spark_functions
col1 = spark_functions.lit(None)
print("type(col1)={}".format(type(col1)))
col2 = col1.cast(StringType())
error is:
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-6046223946582899049.py", line 252, in <module>
eval(compiledCode)
File "<string>", line 14, in <module>
File "/usr/lib/spark/python/pyspark/sql/column.py", line 334, in cast
jdt = ctx._ssql_ctx.parseDataType(dataType.json())
AttributeError: 'JavaMember' object has no attribute 'parseDataType'
This is a known bug with Spark 2.0 on Zeppelin 0.6.1 that is targeted to be fixed in Zeppelin 0.6.2: https://issues.apache.org/jira/browse/ZEPPELIN-1411