This has been an adventure. I started with the looping duplicate query located in my previous question, but each loop would go over all 17 million records, meaning it would take weeks (just running *select count * from MyTable*
takes my server 4:30 minutes using MSSQL 2005). I gleamed information from this site and at this post.
And have arrived at the query below. The question is, is this the correct type of query to run on 17 million records for any type of performance? If it isn't, what is?
SQL QUERY:
DELETE tl_acxiomimport.dbo.tblacxiomlistings
WHERE RecordID in
(SELECT RecordID
FROM tl_acxiomimport.dbo.tblacxiomlistings
EXCEPT
SELECT RecordID
FROM (
SELECT RecordID, Rank() over (Partition BY BusinessName, latitude, longitude, Phone ORDER BY webaddress DESC, caption1 DESC, caption2 DESC ) AS Rank
FROM tl_acxiomimport.dbo.tblacxiomlistings
) al WHERE Rank = 1)
Seeing the QueryPlan would help.
Is this feasible?
SELECT m.*
into #temp
FROM tl_acxiomimport.dbo.tblacxiomlistings m
inner join (SELECT RecordID,
Rank() over (Partition BY BusinessName,
latitude,
longitude,
Phone
ORDER BY webaddress DESC,
caption1 DESC,
caption2 DESC ) AS Rank
FROM tl_acxiomimport.dbo.tblacxiomlistings
) al on (al.RecordID = m.RecordID and al.Rank = 1)
truncate table tl_acxiomimport.dbo.tblacxiomlistings
insert into tl_acxiomimport.dbo.tblacxiomlistings
select * from #temp