Search code examples
bigdatapartitioningab-initio

Ab Initio graph : partioning by key behavior with Replicate


I am asking myself a question concerning Let's suppose I have a flow F which is replicated X times. All the replicated flows are then Join on the same key but with different datasets each time.

I want the joins to be run in a parallel layout. For this particular case, do I need to use X time the "Partition by key" component or can I put only one at the input of the replicate (instead of 1 per replicate output) ?

TLDR : Is this graph https://ibb.co/hHmk5e equivalent to https://ibb.co/i2NNJz supposing all joins occur on same key

Thank you,


Solution

  • Use Replicate into multiple Partition By Keys. Pay caution to the checkpoints, if you have 3 checkpoints after the replicate consider removing them and placing a single checkpoint before the replicate.