Search code examples
dataframeapache-sparkpysparkapache-spark-sqlconditional-operator

Adding a conditional filter clause


This is my dataframe and filters:

df = spark.range(3)

filter1 = (F.col('id') == 1)
filter2 = (F.col('id') == 2)
flag = False

I want to make filter2 dependent on flag. When flag = True, I want filter2 to take effect and not to take effect when flag = False.

I know I can do

filt = filter1
if flag:
    filt = filt | filter2
df = df.filter(filt)

I wonder if this would be possible in one line, for example utilizing conditional expressions/ ternary operators, etc.

This did not work:

df = df.filter(filter1 | filter2 if flag else False)

Desired result when flag = False:

+---+
| id|
+---+
|  1|
+---+

Desired result when flag = True:

+---+
| id|
+---+
|  1|
|  2|
+---+

Solution

  • You can use filter2 with a bitwise & condition on the flag . This will only return True when both the filter2 and flag returns True else False. Use this with filter1 with an or condition.

    output = df.filter(filter1 |(filter2 & F.lit(flag)))