Search code examples
functionpysparkpersist

AttributeError: 'NoneType' object has no attribute 'persist'


When I try to persist Dataframe in pyspark, I meet the AttributeError: 'NoneType' object has no attribute 'persist' error. pseudo code is as follows:

ss = SparkSession.builder.getOrCreate()
sqlDF = ss.sql(query) // query contains udf
sqlDF.persist()

The result of ss.sql(query).show(10) is like follows

      dt|  t|                   u| ra| tm|tra|  alg| fa|
+----------+---+--------------------+---+---+---+-----+---+
|2019-04-22|  1|0e4466fb752e0ff6a...|  2|   |   |h5_rl|   |
|2019-04-22|  1|05ce59ff55b70a805...|  4|   |   | null|   |
|2019-04-22|  1|07bc6ebd8f9d0082d...|  2|   |   |h5_rl|   |

Is the error caused by that some of the cell value are of nonetype? If so, how to solve it?


Solution

  • You may try print out the schema of the sqlDF with sqlDF.printSchema()and find that some column is of NoneType so that spark doesn't know how to serialize them. It may be caused by all values of some column is null, then spark infer the schema of that column as NoneType. You can manually cast the column to a desired type in the query.