Search code examples
sql-serverperformancesql-delete

Delete large number of rows


I want to delete records from Table1 (Rows > 300 million records) based on this criteria (returns > 100 million records): Column1 IS NULL AND Column2 IS NULL AND Column3 IS NULL AND Column4 IS NULL AND Column5 IS NULL AND Column6 IS NULL AND Column7 IS NULL

Table definition:

CREATE TABLE [dbo].[Table1](
    [ID] [uniqueidentifier] NOT NULL,
    [Column1] [float] NULL,
    [Column2] [float] NULL,
    [Column3] [float] NULL,
    [Column4] [float] NULL,
    [Column5] [float] NULL,
    [Column6] [float] NULL,
    [Column7] [float] NULL,
    [Table2ID] [uniqueidentifier] NULL, 
PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

Note: Table2ID is the foreign key contraint from Table2

CREATE TABLE [dbo].[Table2](
    [ID] [uniqueidentifier] NOT NULL,
    [Column1] [date] NULL,
    [Column2] [tinyint] NULL,
    [Column3] [tinyint] NULL,
 CONSTRAINT [PK_Table2_ID] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

I also want to delete all orphan records from Table2 (Row count around 10 million records) which are no longer referenced in Table1. Here are the approaches which I took:

a. Create temp table #table1 with Table1ID and Table2ID columns. Capture all relevant Table1IDs and run the below delete query.

Delete from Table1 where ID in (select Table1ID from #table1)

Delete from Table2 where ID in (select Table2ID from #table1)

The above query is taking enormous time possibly because of table scans.

b. Everything same as above but create unique clustered index on Table1ID column and unique index on Table2ID in #table1 and run the below query:

Delete from Table1 t1 join #table1 tmp1 on t1.ID=tmp1.Table1ID

Delete from Table2 t2 join #table1 tmp1 on t1.ID=tmp1.Table2ID

Is there a better way to handle this scenario? What is the best possible way to delete records from Table1 and Table2?

Note: I am aware of the approach where we create a new table, dump all the relevant data and rename it. Please provide suggestions on alternative approaches - pros and cons.


Solution

  • As You really want to delete high percentage, so you may consider this approach

     SELECT col1, col2, ... INTO #SomeHoldingtable
                FROM MyTable WHERE ..condition..
    
     TRUNCATE TABLE MyTable
    
     INSERT MyTable (col1, col2, ...)
               SELECT col1, col2, ... FROM #SomeHoldingtable