Search code examples
sqldatabasedb2archiving

How to move records from one DB2 database to another DB2 database?


At regular times we want to clean up (delete) records from our production DB (DB2) and move them to an archive DB (also DB2 database having the same schema).

To complete the story there are plenty of foreign key constraints in our DB. So if record b in table B has a foreign key to record a in table A and we are deleting record a in production DB then also record b must be deleted in the production B and both records must be created in the archive DB.

Of course it is very important that no data gets lost. So that it is not possible that we delete records in the production DB while these records will never be inserted in the archive DB.

What is the best approach to do this ?

FYI I have checked https://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.dm.doc/doc/r0024482.html and the proposed solutions have following short comings.

  • Load utility, Ingest utility, Import utility : is only addressing the part of inserting records in the archive DB. It doesn't cover the full move.
  • Export utility : is only covering a means of exporting data (which might be imported by the Import utility*.
  • db2move, restore command, db2relocatedb, ADMIN_COPY_SCHEMA, ADMIN_MOVE_TABLE and split mirror : are not an option if you only want to move specific records meeting a certain condition to the archive DB.

So based on my research, the current best solution seems to be a kind of in-house developed script that is

  1. Exporting the records to move in IXF format
  2. Importing those exported records in the archive DB
  3. deleting those records in the production DB

In order to cause no transaction log full errors, this script should do this in batches (e.g. of 50000 records)

In order to have no foreign key constraint errors in step 3: we must also assure that in step 1 we are also exporting all records having foreign key constraint to the exported records and all records having a foreign key constraint to these records ...


Solution

  • Questions that ask the "best" approach have limited use because the assessment criteria are omitted. Sometimes the assessment criteria differ between technicians and business people.

    Sometimes multiple policies of the client company can determine such criteria, so awareness of local policies and procedures or patterns is crucial .

    Often the operational-requirements and security-requirements and licensing-requirements will influence the approach, apart from the skill level and experience of the implementation-team.

    Occasionally corporates have specific standardised tools for archival and deletion, or specific patterns sometimes influenced by the industry-sector or even industry-specific regulatory requirements.

    As stackoverflow is a programming oriented website, questions like yours can be considered off-topic because you are asking for advice about which design-approach are possible while omitting lots of context that is specific to your company/industry-sector that may well influence the solution pattern.

    Some typical requirements or questions that influence the approach are below:

    • do local security requirements allow the data to leave the Db2 environment? (i.e. data stored on disk outside of Db2 tables). Sometimes this constrains use of export, or load-from-file/pipe). The data can be at risk of modification or inspection or deletion (whether accidental or deliberate) whilst outside of the RDBMS.

    • the restartability of the solution in the event of runtime errors. This is often a crucial requirement. When copying data between different physical databases (even if the same RDBMS) there are many possibilities of error (network errors, resource issues, concurrency issues, operational issues etc). Must the solution guarantee that any restarts after failures resume from the point of failure, or must cleanup happen and the entire job be restarted? The answer can determine the design.

    • if federation exists between the two databases (or if it can be added within the Db2-licence terms), then this is often the easiest practical approach to push or pull content. Local and remote tables appear to be in the same logical database which simplifies the approach. The data never needs to leave the RDBMS. This also simplifies restartability of failed jobs. It also allows the data to remain encrypted if that is a requirement.

    • if SQL-replication or Q-based-replication is licensed then it can be configured to intelligently sync the source and target tables and respect RI if suitably configured. This approach requires significant configuration skills.

    • if the production database is highly-available, and/or if the archival database is highly-available then the solution must respect the HA approach. Sometimes this prevents use of LOAD, depending on the operating-system platform of the Db2-server.

    • timing windows for scheduling are often crucial. If the archival+removal job must guarantee to fully complete with specific time intervals this can influence the design pattern.

    • if fastest rollout is a key requirement then range-partitioning is usually the best option.