I have two large tables A and B, and I want to join these two tables on two columns, say, project_id and customer_id.
When I do the join in Apache Ignite, I find that the performance is very bad. After investigating, I think the problem lies in that the data recide in different nodes randomly.
When the join happens, there are data transfer between nodes to make the same project_id and customer_id from A and B into same node.
For my case,
I would ask which solution would be a better choice, thanks!
The former (1.) is recommended. You should load the data in the fashion so that data for the same project_id
and customer_id
is on the same node in both tables.
This is called affinity collocation and it is paramount to get right to have good performance of Ignite queries (and sometimes for them to work at all).
Ignite will take care of affinity collocation for you if you setup it correctly, but there are a few caveats right away:
project_id
and customer_id
) or a composite type (a nested POJO with its own implications) or a synthetic value maybe?