When I try to persist Dataframe
in pyspark, I meet the AttributeError: 'NoneType' object has no attribute 'persist'
error. pseudo code is as follows:
ss = SparkSession.builder.getOrCreate()
sqlDF = ss.sql(query) // query contains udf
sqlDF.persist()
The result of ss.sql(query).show(10)
is like follows
dt| t| u| ra| tm|tra| alg| fa|
+----------+---+--------------------+---+---+---+-----+---+
|2019-04-22| 1|0e4466fb752e0ff6a...| 2| | |h5_rl| |
|2019-04-22| 1|05ce59ff55b70a805...| 4| | | null| |
|2019-04-22| 1|07bc6ebd8f9d0082d...| 2| | |h5_rl| |
Is the error caused by that some of the cell value are of nonetype? If so, how to solve it?
You may try print out the schema of the sqlDF
with sqlDF.printSchema()
and find that some column is of NoneType
so that spark doesn't know how to serialize them. It may be caused by all values of some column is null, then spark infer the schema of that column as NoneType
. You can manually cast the column to a desired type in the query.