I have 2 tables 'tbMemeberInfo' as well as 'tbDocument'.
When a Document is uploaded it records a field InfoID in tbDocument, which is the PK(MemberInfoID) in tbMemeberInfo.
There are however duplicates in tbMemeberInfo. Users have an 'AgreementNo' and an 'IDNumber' and the duplicate records only contain 1 or the other.
I need to merge these records to insert the 'AgreementNo' and 'IDNumber' into 1 of the records and remove the other.
Here are the table structures and code I have tried so far...
tbMemberInfo
MemberInfoID AgreementNo IDNumber DOB Initials FirstName LastName Language Role CountryID Email
861616 124346665 NULL 1976-08-24 00:00:00.000 DV DAMIAN Example English Member 1 damian.example@mail.com
866185 NULL 7608241234123 1976-08-24 00:00:00.000 DV DAMIAN Example English Member 1 damian.example@mail.com
tbDocument
DocumentID r_object_id DocumentTypeID UniqueDocumentNo ContentLength ContentType FileName CreatedUserID CreatedDate InfoID
293787 0900d431800bc987 13 PPS156329L 1753819 application/pdf Example_DV_PROV_APP_2009110316140300[1].pdf 362 2010-01-13 16:21:46.250 861616
293794 0900d431800bc998 530 PPS156335O 66750 image/tiff Example, DV DRS REPORT.tif 362 2010-01-13 16:26:48.420 861616
SQL Code
DECLARE
@MemberInfoID int
,@AgreementNo varchar(50)
,@IDNumber varchar
,@DOB datetime
,@TitleID int
,@FirstName varchar(150)
,@LastName varchar(150)
,@ModifiedDate datetime
SELECT @AgreementNo = AgreementNo, @IDNumber = IDNumber, @FirstName = FirstName, @LastName = LastName, @DOB = DOB
FROM tbMemberInfo mi
INNER JOIN tbDocument d
ON mi.MemberInfoID = d.InfoID
WHERE (mi.AgreementNo = '') OR (mi.IDNumber = '')
--SELECT @IDNumber = IDNumber From tbMemberInfo mi
--INNER JOIN tbDocument d
--ON mi.MemberInfoID = d.InfoID
--WHERE (mi.AgreementNo = '')
--SELECT @AgreementNo = AgreementNo From tbMemberInfo mi
--INNER JOIN tbDocument d
--ON mi.MemberInfoID = d.InfoID
--WHERE (mi.IDNumber = '') AND (FirstName = @FirstName) AND (LastName = @LastName) AND (DOB = @DOB)
UPDATE tbMemberInfo
SET [IDNumber] = @IDNumber, [AgreementNo] = @AgreementNo, ModifiedDate = GETDATE()
FROM tbMemberInfo mi
INNER JOIN tbDocument d
ON mi.MemberInfoID = d.InfoID
WHERE (IDNumber = '') OR (AgreementNo = '') AND (FirstName = @FirstName)
AND (LastName = @LastName) AND (DOB = @DOB)
GROUP BY MemberInfoID
None of this is working. It puts a '7' into all the IDNumber columns. Any ideas on how to do this. I havn't got round to deleting the duplicates, I want to merge first.
You should use IDNumber is NULL instead of = ''
Your where clause has some problems, you should write something like:
WHERE (IDNumber is Null or AgreementNo is NULL) AND (FirstName = @FirstName AND LastName = @LastName AND DOB = @DOB)
my suggestion would be to create a tmp table with the exact same schema as tbMemberInfo (this will help you organize your steps and simplify the task a little bit); group entries in the way you want and merge them into the tmp table (if I understand correctly you are using FirstName, LastName and DateOfBirth to identify distinct entries). And at the end truncate everything in tbMemberInfo and fill it up with what you have in tmp table.. Here is a little more detail:
INSERT INTO tmp
SELECT * FROM tbMemberInfo;
UPDATE tmp t1 INNER JOIN tbMemberInfo t2
ON t1.FirstName = t2.FirstName
AND t1.LastName = t2.LastName
AND t1.DOB = t2.DOB
SET t1.AgreementNo = t2.AgreementNo
WHERE t1.AgreementNo IS NULL
AND t1.MemberInfoID != t2.MemberInfoID;
UPDATE tmp t1 INNER JOIN tbMemberInfo t2
ON t1.FirstName = t2.FirstName
AND t1.LastName = t2.LastName
AND t1.DOB = t2.DOB
SET t1.IDNumber = t2.IDNumber
WHERE t1.IDNumber is NULL
AND t1.MemberInfoID != t2.MemberInfoID;
-- Just to make sure there are no entries left without a value
SELECT * FROM tmp WHERE IDNumber is NULL OR AgreementNo is NULL;
-- now we are going to keep the row with the lowest ID
DELETE t1 FROM tmp t1, tmp t2
WHERE t1.MemberInfoID < t2.MemberInfoID
AND t1.FirstName = t2.FirstName
AND t1.LastName = t2.LastName
AND t1.DOB = t2.DOB;
-- update the InfoID intbDocument table
UPDATE tbDocument t1 JOIN tbMemberInfo t2
ON t1.InfoID = t2.MemberInfoID
JOIN tmp t2
ON t2.FirstName = t3.FirstName
AND t2.LastName = t3.LastName
AND t2.DOB = t3.DOB
SET t1.InfoID = t3.tbMemberInfoID
WHERE t1.InfoID NOT IN (SELECT MemberInfoID FROM tmp);
TRUNCATE tbMemberInfo;
INSERT INTO tbMemberInfo
SELECT * FROM tmp;
NOTE: I did not test these queries and just wrote them very quickly, so they are probably buggy, but you'll get the idea and can fix them yourself. Don't run these on your original data, make a duplicate and test them first.