Search code examples
pythonapache-sparkpyspark

Stop pyspark aggregation if condition triggers


Let's say I want to check if a pyspark dataframe has any constant column. Let's work with the dataframe from this question:

+----------+----------+
|    A     |    B     |
+----------+----------+
|       2.0|       0.0|
|       0.0|       0.0|
|       1.0|       0.0|
|       1.0|       0.0|
|       0.0|       0.0|
|       1.0|       0.0|
|       0.0|       0.0|
+----------+----------+

Isn't it a way of generate:

+----------+----------+
|    A     |    B     |
+----------+----------+
|     False|      True|
+----------+----------+

Without having to aggregate/filter the whole A column as proposed in this question solution? (let's say, if I dectect two rows aren't equal during aggregation, stop the operation and return False), thus, saving time? Does spark do it internally?


Solution

  • There is no short circuit feature allowing this. So, no.