I have a dataframe like below -
|short_1|short_2|full_nm|
|"Ginder"|"Ginder v. Ginder"|"Carrie Diane GINDER v. Carl L. GINDER"|
|"KENNEY"|"KENNEY v. BARNHART"|""|
|""|"United States v. $933,075.00 IN UNITED STATES CURRENCY"|""|
|""|""|""|
|"Rieser"|""|""|
I want to prefer full_nm column first(if it exists,get this value), then short_2(if full_nm doesn't exist,get this value) and then short_1(if full_nm and short_2 doesn't exist ,get this value),if all 3 values doesn't exist , keep it blank and create a new column based on this priority basis , like below pseudo code in pyspark-
if full_nm:
df['new'] = full_nm
elseif short_2:
df['new'] = short_2
elseif short_1:
df['new'] = short_1
else:
df['new'] = ''
Expected output -
|short_1|short_2|full_nm|new
|"Ginder"|"Ginder v. Ginder"|"Carrie Diane GINDER v. Carl L. GINDER"|"Carrie Diane GINDER v. Carl L. GINDER"|
|"KENNEY"|"KENNEY v. BARNHART"|""|"KENNEY v. BARNHART"|
|""|"United States v. $933,075.00 IN UNITED STATES CURRENCY"|""|"United States v. $933,075.00 IN UNITED STATES CURRENCY"|
|""|""|""|""|
|"Rieser"|""|""|"Rieser"|
Any help would be appreciated.
Mask the empty values with nulls then coalesce
the masked columns in the required order
cols = ['full_nm', 'short_2', 'short_1']
masked_cols = [F.when(F.col(c) != '', F.col(c)) for c in cols]
df = df.withColumn('new', F.coalesce(*masked_cols)).fillna({'new': ''})
df.show()
+-------+--------------------+--------------------+--------------------+
|short_1| short_2| full_nm| new|
+-------+--------------------+--------------------+--------------------+
| Ginder| Ginder v. Ginder|Carrie Diane GIND...|Carrie Diane GIND...|
| KENNEY| KENNEY v. BARNHART| | KENNEY v. BARNHART|
| |United States v. ...| |United States v. ...|
| | | | |
| Rieser| | | Rieser|
+-------+--------------------+--------------------+--------------------+