This is a theoretical scenario, and I am more than amateur when it comes to large scale SQL databases...
How would I go about inserting around 2million records into an existing database off 6million records (table1 into table2), whilst at the same time using email de-duplication (some subscribers may already exist in site2, but we don't want to insert those that already exist)?
I understand how to simply get the records from site 1 and add them into site 2, but how would we do this on such a large scale, and not causing data duplication? Any reading sources would be more than helpful for me, as ive found that a struggle.
i.e.: Table 1: site1Subscribers
site1Subscribers(subID, subName, subEmail, subDob, subRegDate, subEmailListNum, subThirdParties)
Table 2: site2Subscribers
site2Subscribers(subID, subName, subEmail, subDob, subRegDate, subEmailListNum, subThirdParties)
I would try something like this:
insert into site2Subscribers
select * from site1Subscribers s1
left outer join site2Subscribers s2
on s1.subEmail = s2.subEmail
where s2.subEmail is null;
The left outer join
along with the null check will return only those rows from site1Subscribers
that have no matching entry in site2Subscribers
.