Search code examples
pyspark

How to write nested if else in pyspark?


I have a pyspark dataframe and I want to achieve the following conditions:

if col1 is not none:
    if col1 > 17:
       return False
    else:
       return True
    return None 

I have implemented it in the following way:

out = out.withColumn('col2', out.withColumn(
        'col2', when(col('col1').isNull(), None).otherwise(
            when(col('col1') > 17, False).otherwise(True)
        )))

However when I run this I achieve the following error :

  assert isinstance(col, Column), "col should be Column"
AssertionError: col should be Column

Any ideas what I might be doing wrong.


Solution

  • I think the problem comes from the typo you made, writting twice the out.withColumn.

    here is my code :

    from pyspark.sql import functions as F
    
    a = [
        (None,),
        (16,),
        (18,),
    ]
    
    b = [
        "col1",
    ]
    
    df = spark.createDataFrame(a, b)
    
    df.withColumn(
        "col2",
        F.when(F.col("col1").isNull(), None).otherwise(
            F.when(F.col("col1") > 17, False).otherwise(True)
        ),
    ).show()
    
    +----+-----+
    |col1| col2|
    +----+-----+
    |null| null|
    |  16| true|
    |  18|false|
    +----+-----+
    

    You can also do it a bit differently because you do not need the first otherwise or no need to evaluate explicitly the NULL:

    df.withColumn(
        "col2",
        F.when(F.col("col1").isNull(), None)
        .when(F.col("col1") > 17, False)
        .otherwise(True),
    ).show()
    
    # OR
    
    df.withColumn(
        "col2", F.when(F.col("col1") > 17, False).when(F.col("col1") <= 17, True)
    ).show()