Search code examples
c#sql.netsql-serverdata-transfer

Best way to program a nightly data transfer, identity column running high


I'm writing an application that needs to synchronize its own database with another database every night. When the data is pulling pulled over, it's also being transformed programatically in quite a complex way.

Now I see two possible approaches:

  • method A: every night I delete all previously imported rows, and do a full reimport, this method is expensive but easy to program, maintain, overall robust

  • method B: I save a reference to the original data for every imported row and do an insert, update, or delete if the data is respectively new, changed or removed, (keeping the reference alone is quite complex because there's no simple one-to-one relation)

Clearly method B is more complex and might more easily lead to bugs, so I prefer using method A. However since about 100,000 rows are being inserted every night, the identity column will run very high over time. Doing a simple reset to 1 every night is not an option since the rows that are imported are actually being mixed with other rows that have nothing to do with the data transfer, this last set rows has to remain unaffected by the transfer.

The main question I have about this is whether it's a problem at all that our identity column will run high (about 36 million increase every year). It's not tidy, but will there eventually be a performance hit, and what will happen if we ever reach the maximum int value? Has anyone faced similar challenges and can share their experience?


Solution

  • will there eventually be a performance hit

    I don't see why there should be. The size of the field is the same (say, 32 bit), so all operations with it would be performed with the same speed, whether the value of the field is 10 or 2,000,000,000.

    If your identity column is int (32 bit), then it will last for 2,147,483,647 / 36,000,000 = 59 years.

    It is very easy to check what would happen when you reach the maximum 2,147,483,647. Create a table in tempdb. (I'm using SQL Server 2008 for this test).

    USE [tempdb]
    GO
    
    SET ANSI_NULLS ON
    GO
    
    SET QUOTED_IDENTIFIER ON
    GO
    
    CREATE TABLE [dbo].[BigTable](
        [ID] [int] IDENTITY(1,1) NOT NULL,
        [Data] [int] NOT NULL,
     CONSTRAINT [PK_BigTable] PRIMARY KEY CLUSTERED 
    (
        [ID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    
    GO
    

    Set identity to high value:

    USE [tempdb]
    GO
    
    DBCC CHECKIDENT ("[dbo].[BigTable]", RESEED, 2147483645);
    

    Try to insert 10 rows:

    USE [tempdb]
    GO
    
    INSERT INTO [dbo].[BigTable]
    ([Data])
    VALUES
    (0)
    
    GO 10
    

    This is what I get in the output window:

    Beginning execution loop
    
    (1 row(s) affected)
    
    (1 row(s) affected)
    
    (1 row(s) affected)
    Msg 8115, Level 16, State 1, Line 2
    Arithmetic overflow error converting IDENTITY to data type int.
    Arithmetic overflow occurred.
    ** An error was encountered during execution of batch. Continuing.
    Msg 8115, Level 16, State 1, Line 2
    Arithmetic overflow error converting IDENTITY to data type int.
    Arithmetic overflow occurred.
    ** An error was encountered during execution of batch. Continuing.
    Msg 8115, Level 16, State 1, Line 2
    Arithmetic overflow error converting IDENTITY to data type int.
    Arithmetic overflow occurred.
    ** An error was encountered during execution of batch. Continuing.
    Msg 8115, Level 16, State 1, Line 2
    Arithmetic overflow error converting IDENTITY to data type int.
    Arithmetic overflow occurred.
    ** An error was encountered during execution of batch. Continuing.
    Msg 8115, Level 16, State 1, Line 2
    Arithmetic overflow error converting IDENTITY to data type int.
    Arithmetic overflow occurred.
    ** An error was encountered during execution of batch. Continuing.
    Msg 8115, Level 16, State 1, Line 2
    Arithmetic overflow error converting IDENTITY to data type int.
    Arithmetic overflow occurred.
    ** An error was encountered during execution of batch. Continuing.
    Msg 8115, Level 16, State 1, Line 2
    Arithmetic overflow error converting IDENTITY to data type int.
    Arithmetic overflow occurred.
    ** An error was encountered during execution of batch. Continuing.
    Batch execution completed 10 times.
    

    This is the result:

    USE [tempdb]
    GO
    
    SELECT [ID] ,[Data]
    FROM [dbo].[BigTable]
    GO
    
    
    ID  Data
    2147483645  0
    2147483646  0
    2147483647  0
    
    
    DBCC CHECKIDENT ("[tempdb].[dbo].[BigTable]", NORESEED);
    
    Checking identity information: current identity value '2147483647', current column value '2147483647'.
    DBCC execution completed. If DBCC printed error messages, contact your system administrator.
    

    So, yes there will be problems when you exhaust all int values. You can use bigint instead. It is 64-bit (8 bytes, not 16) and will last much-much-much longer: 9,223,372,036,854,775,807 / 36,000,000 = 256,204,778,801 years The performance of bigint is pretty much the same as int on 64-bit computers. It may be slower than int just because it is twice the size and the server will have to read/write twice amount of bytes to the disk and use twice amount of memory.

    In terms of performance what is really important is that you rebuild indexes and update statistics on your table after you delete a lot of rows and add a lot of rows to it, because all statistics would be severely skewed. Here I mean performance of the rest of the system that would use this table, not the process of deleting and adding rows.

    To improve the performance of the nightly sync process that deletes a lot of rows and adds a lot of rows consider disabling all indexes on the table before the changes and enabling (with rebuild) them back after the changes.