Search code examples
for-looppysparksubset

Filling null and not null values as 0 and 1, respectively to only selected columns in pyspark dataframe


For a sample spark df:

sdf = ({'id_col': [25, 13, 15],
                   'x2': ['b', None, '2'],
                   'x3': [None, '0', '3'] })

which I would to have:

sdf = ({'id_col': [25, 13, 15],
                   'x2': [1, 0, 1],
                   'x3': [0, 1, 1] })

replace all null with 0, and notnull with 1.

Following the syntax from: Filling not null values as 1 in pyspark dataframe. it went trough all the columns of the sdf. For my case, I need to keep the values of id_col as is but replace with 1 and 0 on cols x2 and x3.

I initially tried the following:

I put the x2 and x3 cols in a list:

list = ['x2','x3']

then applied this:

col_selection = [when(col(c).isNull(),0).otherwise(1).alias(c) for c in sdf[[list]].columns]

I get the intended 0s and 1s, but I only get the x2 and x3 columns. When I called the sdf, the changes did not happen. If I have 17 columns to go through in a 25-column sdf, can you guide me on how to apply the for loop on only the 17 columns?

Thanks


Solution

  • If possible to do one by one

    from pyspark.sql import functions as F
    import pandas as pd
    sdf = pd.DataFrame({'id_col': [25, 13, 15],
                       'x2': ['b', None, '2'],
                       'x3': [None, '0', '3'] })
    sdf=spark.createDataFrame(sdf)
    sdf.withColumns({"x2":F.when(F.col("x2").isNull(),0).otherwise(1),"x3":F.when(F.col("x3").isNull(),0).otherwise(1)}).show()
    
    #output
    +------+---+---+
    |id_col| x2| x3|
    +------+---+---+
    |    25|  1|  0|
    |    13|  0|  1|
    |    15|  1|  1|
    +------+---+---+
    

    Note-withColumns is only available in spark version >= 3.3.0

    To do in loop

    lis=['x2','x3']
    for a in lis:
      sdf=sdf.withColumn(f"{a}",F.when(F.col(f"{a}").isNull(),0).otherwise(1))
    sdf.show()
    
    #output
    +------+---+---+
    |id_col| x2| x3|
    +------+---+---+
    |    25|  1|  0|
    |    13|  0|  1|
    |    15|  1|  1|
    +------+---+---+