I am using spark-sql 2.3.1, I set
spark.sql.shuffle.partitions=40
in my code '
val partitioned_df = vals_df.repartition(col("model_id"),col("fiscal_year"),col("fiscal_quarter"))
When i say
println(" Number of partitions : " + partitioned_df.rdd.getNumPartitions)
It is giving 40 as output , infact after repartition ideally the count should be around 400 , Why repartition is not working here ? What am I making wrong here? how to fix it ?
set spark.sql.shuffle.partitions=40
This applies to JOINs and AGGregations only was my understanding.
Try something like this - my own example:
val df2 = df.repartition(40, $"c1", $"c2")
Here is the output of
val df2 = df.repartition(40, $"c1", $"c2").explain
== Physical Plan ==
Exchange hashpartitioning(c1#114, c2#115, 40)
...
Can set num partitions dynamically:
n = some calculation
val df2 = df.repartition(n, $"c1", $"c2").explain