Search code examples
etldatastage

DataStage rollback data when one of the parallel jobs fail


Currently I'm building a job that involves multiples of parallel jobs in. Each parallel job loads data into database. If it fails halfway through, the parallel jobs before that will have already been inserted into database. Is there any ways to roll back all the parallel jobs data if it fails halfway? Thank you.


Solution

  • No that is not the concept from a DataStage or ETL perspective.

    Some thoughts on that:

    • Undonig it all would result in a situation where you have to redo it all again and all the time spent already rolling data in would be lost and additionally you would pay lots of time undoing them.
    • If something fails the concept is starting from more less exactly this point again and retry getting the data in. This is supported in DataStage Sequences by setting Checkpoints and restartable sequences.
    • Because the data masses you usually handle with ETL tools and Datastages' pipeline concept there are very limited options from overall job Transaction handling. It is usually limited to the "Load" part where you can commit after i.e. 2000 rows (in order not to cause log problems in the DB)