I am trying to provide broadcast hint to table which is smaller in size, but physical plan is still showing me SortMergeJoin.
spark.sql('select /*+ BROADCAST(pratik_test_temp.crosswalk2016) */ * from pratik_test_staging.crosswalk2016 t join pratik_test_temp.crosswalk2016 c on t.serial_id = c.serial_id').explain()
Note :
created_date
[partitioned column] instead of serial_id
as my joining condition, it is showing me BroadCast Join -spark.sql('select /*+ BROADCAST(pratik_test_temp.crosswalk2016) */ * from pratik_test_staging.crosswalk2016 t join pratik_test_temp.crosswalk2016 c on t.created_date = c.created_date').explain()
Why spark behavior is strange with AWS Glue Catalog as my metastore?
In BROADCAST
hint we need to pass the alias name of the table
(as you have alias kept in your sql statement).
Try with /*+ BROADCAST(c) */*
instead of /*+ BROADCAST(pratik_test_temp.crosswalk2016) */ *
spark.sql('select /*+ BROADCAST(c) */ * from pratik_test_staging.crosswalk2016 t join pratik_test_temp.crosswalk2016 c on t.serial_id = c.serial_id').explain()