Search code examples
sql-server-2008-r2oracle11gchecksumdata-migration

Checksum Validation after migration from Oracle to SQL Server


I am migrating a large database from oracle 11g to SQL SERVER 2008R2 using SSIS. How can the data integrity can be validated for numeric data using checksum?


Solution

  • In the past, I've always done this using a few application controls. It should be something that is easy to compute on both platforms.

    Frequently, the end result is a query like:

    select count(*)
    ,      count(distinct col1) col1_dist_cnt
    ...
    ,      count(distinct col99) col99_dist_cnt
    ,      sum(col1) col1_sum
    ...
    ,      sum(col99) col99_sum
    from   table
    

    Spool to file, Excel or database and compare outcome. Save for project management and auditors.

    Please read more on application control here. I wrote it for checks between various financial reporting systems for the regulatory reporting, so this approach serves most cases.

    If exactly one field value is wrong, it will always show up. Two errors might compensate each other. For example row 1 col 1 gets the value from row 2 col 1.

    To detect for that, multiply each value with something unique for the row. For instance, if you have a unique ID column or identity that is included in the migration too:

    ,      sum(ID * col1) col1_sum
    ...
    ,      sum(ID * col99) col99_sum
    

    When you get number overflows, first try using the largest available precision (especially on SQL Server sometimes difficult). If not feasibly anymore, use mod function. Only few types of error are hidden by mod.

    Icing on the cake is to auto generate these statements. On Oracle look at user_tables, user_tab_columns. On SQL Server look at syscolumns etc.