Search code examples
sql-serverdatabasetriggersetldata-warehouse

MS SQL Trigger for ETL vs Performance


I would need information what might be the impact for production DB of creating triggers for ~30 Production tables that capture any Update,Delete and Insert statement and put following information "PK", "Table Name", "Time of modification" to separate table.
I have limited ability to test it as I have read only permissions to both Prod and Test environment (and I can get one work day for 10 end users to test it).
I have estimated that number of records inserted from those triggers will be around ~150-200k daily.
Background: I have project to deploy Data Warehouse for database that is very customized + there are jobs running every day that manipulate the data. Updated on Date column is not being maintain (customization) + there are hard deletes occurring on tables. We decided to ask DEV team to add triggers like:

 CREATE TRIGGER [dbo].[triggerName] ON [dbo].[ProductionTable]
    FOR INSERT, UPDATE, DELETE 
    AS

    INSERT INTO For_ETL_Warehouse (Table_Name, Regular_PK, Insert_Date)
    SELECT 'ProductionTable', PK_ID, GETDATE() FROM inserted

    INSERT INTO For_ETL_Warehouse (Table_Name, Regular_PK, Insert_Date)
    SELECT 'ProductionTable', PK_ID, GETDATE() FROM deleted

on core ~30 production tables. Based on this table we will pull delta from last 24 hours and push it to Data Warehouse staging tables.

If anyone had similar issue and can help me estimate how it can impact performance on production database I will really appreciate. (if it works - I am saved, if not I need to propose other solution. Currently mirroring or replication might be hard to get as local DEVs have no idea how to set it up...) Other ideas how to handle this situation or perform tests are welcome (My deadline is Friday 26-01).


Solution

  • First of all I would suggest you code your table name into a smaller variable and not a character one (30 tables => tinyint).

    Second of all you need to understand how big is the payload you are going to write and how:

    1. if you chose a correct clustered index (date column) then the server will just need to out data row by row in a sequence. That is a silly easy job even if you put all 200k rows at once.

    2. if you code the table name as a tinyint, then basically it has to write:

      • 1byte (table name) + PK size (hopefully numeric so <= 8bytes) + 8bytes datetime - so aprox 17bytes on the datapage + indexes if any + log file . This is very lightweight and again will put no "real" pressure on sql sever.
    3. The trigger itself will add a small overhead, but with the amount of rows you are talking about, it is negligible.

    I saw systems that do similar stuff on a way larger scale with close to 0 effect on the work process, so I would say that it's a safe bet. The only problem with this approach is that it will not work in some cases (ex: outputs to temp tables from DML statements). But if you do not have these kind of blockers then go for it.

    I hope it helps.