Search code examples
pythonpysparkconditional-statementscase

create a new column based on three columns in pyspark dataframe


I have a dataframe like below -

|short_1|short_2|full_nm|
|"Ginder"|"Ginder v. Ginder"|"Carrie Diane GINDER v. Carl L. GINDER"|
|"KENNEY"|"KENNEY v. BARNHART"|""|
|""|"United States v. $933,075.00 IN UNITED STATES CURRENCY"|""|
|""|""|""|
|"Rieser"|""|""|

I want to prefer full_nm column first(if it exists,get this value), then short_2(if full_nm doesn't exist,get this value) and then short_1(if full_nm and short_2 doesn't exist ,get this value),if all 3 values doesn't exist , keep it blank and create a new column based on this priority basis , like below pseudo code in pyspark-

if full_nm:
   df['new'] = full_nm
elseif short_2:
   df['new'] = short_2
elseif short_1:
  df['new'] = short_1
else:
  df['new'] = ''

Expected output -

|short_1|short_2|full_nm|new
|"Ginder"|"Ginder v. Ginder"|"Carrie Diane GINDER v. Carl L. GINDER"|"Carrie Diane GINDER v. Carl L. GINDER"|
|"KENNEY"|"KENNEY v. BARNHART"|""|"KENNEY v. BARNHART"|
|""|"United States v. $933,075.00 IN UNITED STATES CURRENCY"|""|"United States v. $933,075.00 IN UNITED STATES CURRENCY"|
|""|""|""|""|
|"Rieser"|""|""|"Rieser"|

Any help would be appreciated.


Solution

  • Mask the empty values with nulls then coalesce the masked columns in the required order

    cols = ['full_nm', 'short_2', 'short_1']
    masked_cols = [F.when(F.col(c) != '', F.col(c)) for c in cols]
                        
    df = df.withColumn('new', F.coalesce(*masked_cols)).fillna({'new': ''})
    

    df.show()
    +-------+--------------------+--------------------+--------------------+
    |short_1|             short_2|             full_nm|                 new|
    +-------+--------------------+--------------------+--------------------+
    | Ginder|    Ginder v. Ginder|Carrie Diane GIND...|Carrie Diane GIND...|
    | KENNEY|  KENNEY v. BARNHART|                    |  KENNEY v. BARNHART|
    |       |United States v. ...|                    |United States v. ...|
    |       |                    |                    |                    |
    | Rieser|                    |                    |              Rieser|
    +-------+--------------------+--------------------+--------------------+