Search code examples
pythonapache-sparkpysparkoutliers

Pyspark - Replace value with Null Conditional


I need to replace my outliers with nulls in pyspark

df = df.withColumn("rpm", when(df["rpm"] >= 750, None).otherwise(df["rpm"]))

However I get this error:

TypeError: condition should be a Column

Solution

  • Anky's comment above works. Thanks.

    df.withColumn("rpm", when(col("rpm") >= 750, None).otherwise(col("rpm")))