I need to replace my outliers with nulls in pyspark
df = df.withColumn("rpm", when(df["rpm"] >= 750, None).otherwise(df["rpm"]))
However I get this error:
TypeError: condition should be a Column
Anky's comment above works. Thanks.
df.withColumn("rpm", when(col("rpm") >= 750, None).otherwise(col("rpm")))