We have a master Person
record and one (or more) duplicate Persons
and we are merging their data, prioritising the master over the duplicate(s).
When it comes to phone numbers the goal is to merge their data, with a single phone number going into the Phone
field and any other phone numbers going into a notes field (so as not to discard them completely). Records may or may not contain a phone number.
For neatness we don't want to add to the notes field a bunch of numbers which are basically the same. So we don't want the field to contain:
(1234) 123123
1234 123123
This would be easy if we could just discard the formatting and spaces but we need to retain those (except for white space on the beginning/end).
We started by creating a Structure (not sure why we have a Structure versus a Class, but anyway)
Friend Structure PhoneNumber
Private _Raw As String
Public Property Raw() As String
Get
Return _Raw
End Get
Set(ByVal value As String)
_Raw = value
End Set
End Property
Private _Stripped As String
Public Property Stripped() As String
Get
Return _Stripped
End Get
Set(ByVal value As String)
_Stripped = value
End Set
End Property
Sub New(ByVal num As String)
Raw = num
Dim RegexObj As New System.Text.RegularExpressions.Regex("[^\d]")
Stripped = RegexObj.Replace(num, "")
MsgBox(num & vbCrLf & Stripped)
End Sub
End Structure
Then, the merge code looks like this:
Dim phones As New List(Of PhoneNumber)
If master.Phone.Trim.Length > 1 Then
phones.Add(New PhoneNumber(master.Phone.Trim))
End If
For Each x As Person In duplicates
If x.Phone.Trim.Length > 1 And Not phones.Contains(New PhoneNumber(x.Phone.Trim)) Then
phones.Add(New PhoneNumber(x.Phone.Trim))
End If
Next
If phones.Count > 0 Then
master.Phone = phones(0).Raw
End If
For i = 1 To phones.Count - 1
master.Notes &= vbCrLf & "Alt. Phone: " & phones(i).Raw
Next
But, obviously, the problem here is it's allowing the duplicates.
We kind of want the Contains
to match on "stripped" values only, but of course it doesn't know to do that.
This already seems like too much code for such a minor feature, but at the moment we're looking at writing something (in the Structure?) that will replace the Contains
and match on stripped only. Is there a neater way?
Code is in VB, but C# answers welcome.
Remember too that we have to prioritise the master, so if we use LINQ and Distinct we need to ensure we don't lose the sort order (that's my understanding).
Figured out a better way to do this was to use a Dictionary
. That way we can do without the Structure and use Dictionary lookups on both the Key (the stripped phone number) and the Value (the formatted original).
Something like this:
Dim RegexObj As New System.Text.RegularExpressions.Regex("[^\d]")
Dim phones As New Dictionary(Of String, String)
master.Phone = master.Phone.Trim
If master.Phone.Length > 1 Then
phones.Add(RegexObj.Replace(master.Phone, ""), master.Phone)
End If
For Each x As Person In duplicates
x.Phone = x.Phone.Trim
If x.Phone.Length > 1 And Not phones.ContainsKey(RegexObj.Replace(x.Phone, "")) Then
phones.Add(RegexObj.Replace(x.Phone, ""), x.Phone)
End If
Next
If phones.Count > 0 Then
master.Phone = phones.First.Value
phones.Remove(phones.First.Key)
End If
For Each entry As KeyValuePair(Of String, String) In phones
master.Notes &= IIf(String.IsNullOrEmpty(master.Notes.Trim), "", vbCrLf).ToString _
& "Alt. Phone: " & entry.Value
Next