Search code examples
pythondataframepysparkapache-spark-sqlconditional-statements

How to change specific value in a row, using dictionary and conditions?


I would like to change value in one specific row based on value from other columns in the same row. I tried with method .withColumn() but it doesn't work as I wanted. What I want to get - set flag 1 when the object has date in dataframe just like in the given dictionary:

objects_ = {'x': [datetime.date(2010, 1, 1), datetime.date(2012, 1, 9), datetime.date(2012, 11, 1)], 'y': [datetime.date(2010, 5, 23), datetime.date(2002, 4, 3)]}
date flag obj
2010-01-01 0 x
2010-02-03 0 x
2010-02-04 0 x
2010-05-23 0 y
2010-10-13 0 y

to

date flag obj
2010-01-01 1 x
2010-02-03 0 x
2010-02-04 0 x
2010-05-23 1 y
2010-10-13 0 y

Dates are datetime.date type and dataframe is pyspark's.

Flag for date 2010-05-23 would be 1 only for object y (not for x too) as its listed for y key in dictionary.

I've made something like this, but I failed. Perhaps, there is some method to avoid the loops?

for x in x_list:
    for y in x_dict[x]:
        df = df.withColumn("flag", when(df["date"] == y, 1).otherwise(df["flag"]))

Solution

  • you can create case whens for each of the keys and then pass them to a coalesce.

    here's the approach

    mapper = [func.when((func.col('obj') == k) & func.col('date').isin(v), 1) for k, v in objects_.items()]
    # [Column<'CASE WHEN ((obj = x) AND (date IN (DATE '2010-01-01', DATE '2012-01-09', DATE '2012-11-01'))) THEN 1 ELSE obj END'>,
    #  Column<'CASE WHEN ((obj = y) AND (date IN (DATE '2010-05-23', DATE '2002-04-03'))) THEN 1 ELSE obj END'>]
    
    data_sdf. \
        withColumn('flag_new', func.coalesce(*mapper, 'flag')). \
        show()
    
    # +----------+----+---+--------+
    # |      date|flag|obj|flag_new|
    # +----------+----+---+--------+
    # |2010-01-01|   0|  x|       1|
    # |2010-02-03|   0|  x|       0|
    # |2010-02-04|   0|  x|       0|
    # |2010-05-23|   0|  y|       1|
    # |2010-10-13|   0|  y|       0|
    # +----------+----+---+--------+