Search code examples
apache-sparkspark-streamingwindow-functions

Where is executed the Apache Spark reductionByWindow function?


I try to learn apache spark and I can't understand from documentation how window operations work.

I have two worker node and I use Kafka Spark Utils to create DStream from a Topic.

On this DStream I apply map function and a reductionByWindow.

I can't understand if reductionByWindow is executed on a each worker or in the driver.

I have searched on google without any result.

Can Someone explain me?


Solution

  • Both receiving and processing data happens on the worker nodes. Driver creates receivers (on worker nodes) which are responsible for data collection, and periodically starts jobs to process collected data. Everything else is pretty much standard RDDs and normal Spark jobs.