I am trying to find out if I misconfigured something or am I hitting limits of single node redshift cluster?
I am using:
numRows
property according to the documentationI am trying to perform a spatial join 16m rows into 10m rows using ST_Contains()
and it just never finishes. I know that query is correct because it is able to join 16m rows with 2m rows in 6 seconds.
(query in Athena on same data completes in 2 minutes)
Case with 10m rows has been running for 60 minutes now and seems like it will just never finish. Any thoughts?
It eventually finishes but takes very long time. I understand that this is due to hitting unknown limit of rows where Redshift changes strategy of joining to Slow Strategy. This is how I understand these strategies:
Fast Strategy
Redshift takes table you are trying to join and sends it to every spectrum worker node resulting in a very fast join. Each worker performs join on its part of data and then sends results back.
Slow Strategy
Table you are trying to join is too big to be sent to a worker node. Each worker node sends its data back to main node and this node performs the join.
It has to perform join, then wait for data from another worker node, perform join, (...).
How to speed it up