1). WholeStageCodeGen -> Exchange 2). Exchange -> WholeStageCodeGen -> SortAggregate -> Exchange
Whole-stage code generation is a technique inspired by modern compilers to collapse the entire query into a single function.
Prior to whole-stage code generation, each physical plan is a class with the code defining the execution. With whole-stage code generation, all the physical plan nodes in a plan tree work together to generate Java code in a single function for execution. This Java code is then turned into JVM bytecode using Janino, a fast Java compiler. Then JVM JIT kicks in to optimize the bytecode further and eventually compiles them into machine instructions.
For example
== Physical Plan ==
*Project [id#27, token#28, token#6]
+- *SortMergeJoin [id#27], [id#5], Inner
:- *Sort [id#27 ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(id#27, 200)
Where ever you see *, it means that wholestagecodegen
has generated hand-written code before the aggregation.
Exchange means the Shuffle Exchange between jobs. Exchange does not have whole-stage code generation because it sends data across the network.