Search code examples
regexvbaexcelexcel-match

Comparing 2 lists in Excel with VBA Regex


I want to use them to compare two lists (columns) in Excel to find matches. As this is quite a complex operation, I have performed it in the past, using several different functions (non-VBA) in Excel, but it has proved to be awkward at best, and so I wanted to try an all-in-one VBA solution, if possible.

The first column has names with irregularities (such as quoted nicknames, suffixes such as 'jr' or 'sr', and parentheses around 'preferred' versions of first names). Additionally, when middle names are present, they may be either the name or the initial.

The order in the first column is:

 <first name or initial>
 <space>
 <any parenthetical 'preferred' names - if they exist>
 <space>
 <middle name or initial - if it exists>
 <space>
 <quoted nickname or initial - if it exists>
 <space>
 <last name>
 <comma - if necessary><space - if necessary><suffix - if it exists>

The order in the second column is:

 `<lastname><space><suffix>,<firstname><space><middle name, if it exists>`

, with none of the 'irregularities' that the first column has.

My main objective is to 'clean' the first column into this order:

 `lastname-space-suffix,firstname-space-preferred name-space-
 middle name-space-nickname`

Although I keep the 'irregularities' in here, I could possibly use some sort of 'flags' in my compare code to alert me of them on a case-by-case basis.

I've been trying several patterns, and this is my most recent:

["]?([A-Za-z]?)[.]?["]?[.]?[\s]?[,]?[\s]?

However, I would like to allow for the last name and suffixes (if they exist). I had tested it with 'global,' however I couldn't figure out how to separate the last name and suffixes, via backreferencing, for example.

I would then like to compare the last,first, middle initial (as most names are only initials in the first list) between the two lists.

 An example would be:
 (1st list)
 John (Johnny) B. "Abe" Smith, Jr.
 turned into:
 Smith Jr,John (Johnny) B "Abe"
 or
 Smith Jr,John B

 and
 (2nd list)
 Smith Jr,John Bertrand
 turned into:
 Smith Jr,John B

 Then run a comparison between the two columns.

What would be a good starting or continuing point for this list comparison?


Apr 10, 2012 addendum:

As a side note, I will need to eliminate the quotes from the nicknames and the parentheses from the preferred names. Can I just break the grouped references down further into sub groups (in the below examples)?

 (?:  ([ ] \( [^)]* \)))?  # (2) parenthetical 'preferred' name (optional) 
 (?:  ([ ] (["'] ) .*?) \6 )? # (5,6) quoted nickname or initial (optional) 

Can I group them like this:

 (?:(([ ])(\()([^)]*)(\))))? # (2) parenthetical 'preferred' name (optional) 
 not sure how to do this one -  # (5,6) quoted nickname or initial (optional) 

I tried them in 'Regex Coach' and 'RegExr', and they worked fine, but in VBA, when I wanted the backreferences returned as in \11,\5 all that was returned was First Name, the numeral one, and a comma (e.g. "Carl1,"). I'm going back to check for any typos. Thanks for any help.


Apr 17, 2012 addendum:

There was one name 'situation' I overlooked, and that is last names consisting of 2 or more words, e.g. 'St Cyr' or 'Von Wilhelm'.
Would the following addition

 `((St|Von)[ ])?

work in this Regex, which you offered?

 `((St|Von)[ ])?([^\,()"']+)

My tests in Regex Coach and RegExr haven't quite worked, in that the replacement returns 'St' with a space preceding.


Solution

  • Redo -

    This is different approach. It might work in your VBA, and is just an example. I tested this in Perl and it worked great. But, I won't show the perl code,
    just regex's and some explanations.

    This is a two step process.

    1. Normalize the column text
    2. Do the main parse

    Normalize Process

    • Get a column value
    • Strip out all dots . - Globally search for \. , replace with nothing ''
    • Turn whitespaces into spaces - Globally search for \s+ , replace with a single space [ ]

    (Note that if it can't be normalized, I don't see much chance for sucess no matter what is tried)

    Main Parse Process

    After normalizing a column value (do for both columns), run it through these regex's.

    Column 1 regex

    ^
      [ ]?
      ([^\ ,()"']+)                        # (1)     first name or initial          (required)
      (?:  ([ ] \( [^)]* \))    )?         # (2)     parenthetical 'preferred' name (optional)
      (?:
           ([ ] [^\ ,()"'] )               # (3,4)   middle initial OR name         (optional)
           ([^\ ,()"']*)                   #         name and initial are both captured
      )?
      (?:  ([ ] (["'] ) .*?) \6 )?         # (5,6)   quoted nickname or initial     (optional)
      [ ]  ([^\ ,()"']+)                   # (7)     last name                      (required)
      (?:
            [, ]* ([ ].+?) [ ]?            # (8)     suffix                         (optional)
          | .*?
      )?
    $
    

    The replacement depends on what you want.
    Three types are defined (replace $ with \ as needed):

    1. type 1a full middle - $7$8,$1$2$3$4$5$6
    2. type 1b middle initial - $7$8,$1$2$3$5$6
    3. type 2 middle initial - $7$8,$1$3

    Example conversion:

    Input (raw)               = 'John (Johnny) Bertrand "Abe" Smith, Jr.  '
    Out type 1 full middle    = 'Smith Jr,John (Johnny) Bertrand "Abe"'
    Out type 1 middle initial = 'Smith Jr,John (Johnny) B "Abe"'
    Out type 2 middle initial = 'Smith Jr,John B'
    

    Column 2 regex

    ^
      [ ]?
      ([^\ ,()"']+)                  # (1)     last name                      (required)
      (?: ([ ] [^\ ,()"']+) )?       # (2)     suffix                         (optional)
      ,
      ([^\ ,()"']+)                  # (3)     first name or initial          (required)
      (?:
          ([ ] [^\ ,()"'])           # (4,5)   middle initial OR name         (optional)
          ([^\ ,()"']*)
      )?
      .*
    $
    

    The replacement depends on what you want.
    Two types are defined (replace $ with \ as needed):

    1. type 1a full middle - $1$2,$3$4$5
    2. type 1b middle initial - $1$2,$3$4

    Example conversion:

    Input                     = 'Smith Jr.,John Bertrand  '
    Out type 1 full middle    = 'Smith Jr,John Bertrand'
    Out type 1 middle initial = 'Smith Jr,John B'
    

    VBA Replacement Help

    This works on a very old copy of Excel, creating a VBA project.
    These are two modules created to show an example.
    They both do the same thing.

    The first is a verbose example of all the replacement types possible.
    The second is a trimmed down version using just a type-2 comparison.

    I haven't done VB before as you can tell, but it should be simple enough
    for you to glean how the replacement works, and how to tie in the excel
    columns.

    If you are doing just a flat comparison, you might want to do a col 1 val
    once, then check each value in column 2 against it, then go to next val in
    column 1, then repeat.

    For the fastest way to do this, create 2 extra columns, convert respected
    columns vals to type-2 (variables strC1_2 and strC2_2, see example), then copy them
    to the new columns.
    After that, you don't need regex, just compare columns, find the matching rows,
    then delete the type-2 columns.

    Verbose -

    Sub RegexColumnValueComparison()
    
    ' Column 1 and 2 , Sample values
    ' These should probably be passed in values
    ' ============================================
    strC1 = "John (Johnny)   Bertrand ""Abe""   Smith, Jr.  "
    strC2 = "Smith Jr.,John Bertrand  "
    
    ' Normalization Regexs for whitespace's and period's
    ' (use for both column values)
    ' =============================================
    Set rxDot = CreateObject("vbscript.regexp")
    rxDot.Global = True
    rxDot.Pattern = "\."
    Set rxWSp = CreateObject("vbscript.regexp")
    rxWSp.Global = True
    rxWSp.Pattern = "\s+"
    
    ' Column 1 Regex
    ' ==================
    Set rxC1 = CreateObject("vbscript.regexp")
    rxC1.Global = False
    rxC1.Pattern = "^[ ]?([^ ,()""']+)(?:([ ]\([^)]*\)))?(?:([ ][^ ,()""'])([^ ,()""']*))?(?:([ ]([""']).*?)\6)?[ ]([^ ,()""']+)(?:[, ]*([ ].+?)[ ]?|.*?)?$"
    
    ' Column 2 Regex
    ' ==================
    Set rxC2 = CreateObject("vbscript.regexp")
    rxC2.Global = False
    rxC2.Pattern = "^[ ]?([^ ,()""']+)(?:([ ][^ ,()""']+))?,([^ ,()""']+)(?:([ ][^ ,()""'])([^ ,()""']*))?.*$"
    
    ' Normalize column 1 and 2, Copy to new var
    ' ============================================
    strC1_Normal = rxDot.Replace(rxWSp.Replace(strC1, " "), "")
    strC2_Normal = rxDot.Replace(rxWSp.Replace(strC2, " "), "")
    
    
    ' ------------------------------------------------------
    ' This section is informational
    ' Shows some sample replacements before comparison
    ' Just pick 1 replacement from each column, discard the rest
    ' ------------------------------------------------------
    
    ' Create Some Replacement Types for Column 1
    ' =====================================================
    strC1_1a = rxC1.Replace(strC1_Normal, "$7$8,$1$2$3$4$5$6")
    strC1_1b = rxC1.Replace(strC1_Normal, "$7$8,$1$2$3$5$6")
    strC1_2 = rxC1.Replace(strC1_Normal, "$7$8,$1$3")
    
    ' Create Some Replacement Types for Column 2
    ' =====================================================
    strC2_1b = rxC2.Replace(strC2_Normal, "$1$2,$3$4$5")
    strC2_2 = rxC2.Replace(strC2_Normal, "$1$2,$3$4")
    
    ' Show Types in Message Box
    ' =====================================================
    c1_t1a = "Column1 Types:" & Chr(13) & "type 1a full middle    - " & strC1_1a
    c1_t1b = "type 1b middle initial - " & strC1_1b
    c1_t2 = "type 2 middle initial - " & strC1_2
    c2_t1b = "Column2 Types:" & Chr(13) & "type 1b middle initial - " & strC2_1b
    c2_t2 = "type 2 middle initial - " & strC2_2
    
    MsgBox (c1_t1a & Chr(13) & c1_t1b & Chr(13) & c1_t2 & Chr(13) & Chr(13) & c2_t1b & Chr(13) & c2_t2)
    
    ' ------------------------------------------------------
    ' Compare a Value from Column 1 vs Column 2
    ' For this we will compare Type 2 values
    ' ------------------------------------------------------
    If strC1_2 = strC2_2 Then
       MsgBox ("Type 2 values are EQUAL: " & Chr(13) & strC1_2)
    Else
       MsgBox ("Type 2 values are NOT Equal:" & Chr(13) & strC1_2 & " != " & strC1_2)
    End If
    
    ' ------------------------------------------------------
    ' Same comparison (Type 2) of Normalized column 1,2 values
    ' In esscense, this is all you need
    ' ------------------------------------------------------
    If rxC1.Replace(strC1_Normal, "$7$8,$1$3") = rxC2.Replace(strC2_Normal, "$1$2,$3$4") Then
       MsgBox ("Type 2 values are EQUAL")
    Else
       MsgBox ("Type 2 values are NOT Equal")
    End If
    
    End Sub
    

    Type 2 only -

    Sub RegexColumnValueComparison()
    
    ' Column 1 and 2 , Sample values
    ' These should probably be passed in values
    ' ============================================
    strC1 = "John (Johnny)   Bertrand ""Abe""   Smith, Jr.  "
    strC2 = "Smith Jr.,John Bertrand  "
    
    ' Normalization Regexes for whitespace's and period's
    ' (use for both column values)
    ' =============================================
    Set rxDot = CreateObject("vbscript.regexp")
    rxDot.Global = True
    rxDot.Pattern = "\."
    Set rxWSp = CreateObject("vbscript.regexp")
    rxWSp.Global = True
    rxWSp.Pattern = "\s+"
    
    ' Column 1 Regex
    ' ==================
    Set rxC1 = CreateObject("vbscript.regexp")
    rxC1.Global = False
    rxC1.Pattern = "^[ ]?([^ ,()""']+)(?:([ ]\([^)]*\)))?(?:([ ][^ ,()""'])([^ ,()""']*))?(?:([ ]([""']).*?)\6)?[ ]([^ ,()""']+)(?:[, ]*([ ].+?)[ ]?|.*?)?$"
    
    ' Column 2 Regex
    ' ==================
    Set rxC2 = CreateObject("vbscript.regexp")
    rxC2.Global = False
    rxC2.Pattern = "^[ ]?([^ ,()""']+)(?:([ ][^ ,()""']+))?,([^ ,()""']+)(?:([ ][^ ,()""'])([^ ,()""']*))?.*$"
    
    ' Normalize column 1 and 2, Copy to new var
    ' ============================================
    strC1_Normal = rxDot.Replace(rxWSp.Replace(strC1, " "), "")
    strC2_Normal = rxDot.Replace(rxWSp.Replace(strC2, " "), "")
    
    ' Comparison (Type 2) of Normalized column 1,2 values
    ' ============================================
    strC1_2 = rxC1.Replace(strC1_Normal, "$7$8,$1$3")
    strC2_2 = rxC2.Replace(strC2_Normal, "$1$2,$3$4")
    
    If strC1_2 = strC2_2 Then
       MsgBox ("Type 2 values are EQUAL")
    Else
       MsgBox ("Type 2 values are NOT Equal")
    End If
    
    End Sub
    

    Paren/Quote Response

    As a side note, I will need to eliminate the quotes from the nicknames and the parentheses from the preferred names.

    If I understand correctly ..

    Yes, you can capture the contents inside quotes and parenthesis separetly.
    It just requires some modifications. The below regex has the ability to
    formulate a replacement with or without quotes and/or parenthesis,
    or other forms.

    Samples below give ways to formulate replacements.

    Very Important Note here

    IF you are talking about eliminating quotes "" and parenthesis () from the
    matching regex, that could be done as well. It requires a new regex.

    The only problem is that ALL distinction between preferred/middle/nick
    gets thrown out the window because these were positional as well as
    delimited (ie: (preferred) middle "nick" ).

    Taking away consideration of that would require regex subexpressions like this

    (?:[ ]([^ ,]+))?   # optional preferred
    (?:[ ]([^ ,]+))?   # optional middle
    (?:[ ]([^ ,]+))?   # optional nick
    

    And, thier being optional, loses all positional reference and renders the midle-initial
    expression invalid.

    End Note

    Regex template (use to formulate replacement strings)

    ^
      [ ]?
    
    # (required) 
       # First
       #  $1  name
       # -----------------------------------------
        ([^\ ,()"']+)                 # (1) name     
    
    # (optional)
       # Parenthetical 'preferred'
       #  $2    all
       #  $3$4  name
       # -----------------------------------------
        (?: (                         #  (2)   all  
               ([ ]) \( ([^)]*) \)    #  (3,4) space and name
            )
        )?  
    
    # (optional)
      # Middle
      #   $5    initial
      #   $5$6  name
      # -----------------------------------------
        (?:  ([ ] [^\ ,()"'] )       #  (5) first character
             ([^\ ,()"']*)           #  (6) remaining characters
    
        )?                                   
    
    # (optional)
       # Quoted nick                       
       #  $7$8$9$8  all
       #  $7$9      name
       # -----------------------------------------
        (?: ([ ])                    # (7) space
            (["'])                   # (8) quote
            (.*?)                    # (9) name
            \8 
        )?
    
    # (required)
       #  Last
       #  $10  name
       # -----------------------------------------
        [ ] ([^\ ,()"']+)            # (10) name
    
    # (optional)
       # Suffix 
       #  $11  suffix
       # -----------------------------------------
        (?:    [, ]* ([ ].+?) [ ]?   # (11) suffix
            |  .*?
        )?
    $
    

    VBA regex (2nd edition, tested in my VBA project from above)

    rxC1.Pattern = "^[ ]?([^ ,()""']+)(?:(([ ])\(([^)]*)\)))?(?:([ ][^ ,()""'])([^ ,()""']*))?(?:([ ])([""'])(.*?)\8)?[ ]([^ ,()""']+)(?:[, ]*([ ].+?)[ ]?|.*?)?$"
    
    
    strC1_1a  = rxC1.Replace( strC1_Normal, "$10$11,$1$2$5$6$7$8$9$8" )
    strC1_1aa = rxC1.Replace( strC1_Normal, "$10$11,$1$3$4$5$6$7$9" )
    strC1_1b  = rxC1.Replace( strC1_Normal, "$10$11,$1$2$5$7$8$9$8" )
    strC1_1bb = rxC1.Replace( strC1_Normal, "$10$11,$1$3$4$5$7$9" )
    strC1_2   = rxC1.Replace( strC1_Normal, "$10$11,$1$5" )
    

    Sample input/output possibilities

    Input (raw)                 = 'John (Johnny) Bertrand "Abe" Smith, Jr.  '
    
    Out type 1a  full middle    = 'Smith Jr,John (Johnny) Bertrand "Abe"'
    Out type 1aa full middle    = 'Smith Jr,John Johnny Bertrand Abe'
    Out type 1b  middle initial = 'Smith Jr,John (Johnny) B "Abe"'
    Out type 1bb middle initial = 'Smith Jr,John Johnny B Abe'
    Out type 2   middle initial = 'Smith Jr,John B'
    
    Input (raw)                 = 'John  (Johnny)  Smith, Jr.'
    
    Out type 1a  full middle    = 'Smith Jr,John (Johnny)'
    Out type 1aa full middle    = 'Smith Jr,John Johnny'
    Out type 1b  middle initial = 'Smith Jr,John (Johnny)'
    Out type 1bb middle initial = 'Smith Jr,John Johnny'
    Out type 2   middle initial = 'Smith Jr,John'
    
    
    Input (raw)                 = 'John  (Johnny)  "Abe" Smith, Jr.'
    
    Out type 1a  full middle    = 'Smith Jr,John (Johnny) "Abe"'
    Out type 1aa full middle    = 'Smith Jr,John Johnny Abe'
    Out type 1b  middle initial = 'Smith Jr,John (Johnny) "Abe"'
    Out type 1bb middle initial = 'Smith Jr,John Johnny Abe'
    Out type 2   middle initial = 'Smith Jr,John'
    
    
    Input (raw)                 = 'John   "Abe" Smith, Jr.'
    
    Out type 1a  full middle    = 'Smith Jr,John "Abe"'
    Out type 1aa full middle    = 'Smith Jr,John Abe'
    Out type 1b  middle initial = 'Smith Jr,John "Abe"'
    Out type 1bb middle initial = 'Smith Jr,John Abe'
    Out type 2   middle initial = 'Smith Jr,John'
    

    Re: 4/17 concern

    last names that have 2 or more words. Would the allowance for certain literal names, rather than generic word patterns, be the solution?

    Actually, no it wouldn't. In this case, for your form, allowing multiple words in the last name
    injects the space field delimeter into the last name field.

    However, for your particular form, it could be done, since the only handicap is when the
    "nick" field is missing. When it is missing and given that there is only one word in the
    middle name, 2 permutations are presented.

    Hopefully You can get the solution from the 3 regexes and test case output(s) below. The regexes have expunged space delimeters from the capture. So, you can either compose
    replacements with the Replace method, or just store the capture buffers to compare to
    the results of capture scenario's of other columns.

    Nick_rx.Pattern (template)
    
    * This pattern is multi-word last name, NICK is required 
    
    ^
      [ ]?
    
       # First (req'd)
        ([^\ ,()"']+)              # (1) first name
    
       # Preferred first
        (?: [ ]
           (                       # (2) (preferred), -or-
             \( ([^)]*?) \)        # (3) preferred
           )
        )?  
    
       # Middle
        (?: [ ]
            (                      # (4) full middle, -or-
              ([^\ ,()"'])         # (5) initial
              [^\ ,()"']*
            )
        )?
    
       # Quoted nick (req'd)
         [ ]
         (                         # (6) "nick",
           (["'])   # (7) n/a        -or-
           (.*?)                   # (8)  nick
           \7
         )
    
       # Single/Multi Last (req'd)
        [ ]
        (                          # (9) multi/single word last name
          [^\ ,()"']+
          (?:[ ][^\ ,()"']+)*
        )
    
       # Suffix 
        (?: [ ]? , [ ]? (.*?) )?   # (10) suffix
    
      [ ]?
    $
    
    -----------------------------------
    
    FLs_rx.Pattern (template)
    
    * This pattern has no MIDDLE/NICK, is single-word last name,
    * and has no permutations.
    
    ^
      [ ]?
    
       # First (req'd)
        ([^\ ,()"']+)              # (1) first name
    
       # Preferred first
        (?: [ ]
           (                       # (2) (preferred), -or-
             \( ([^)]*?) \)        # (3)  preferred
           )
        )?  
    
       # Single Last (req'd)
         [ ]
         ([^\ ,()"']+)             # (4) single word last name
    
       # Suffix 
        (?: [ ]? , [ ]? (.*?) )?   # (5) suffix
    
      [ ]?
    $
    
    -----------------------------------
    
    FLm_rx.Pattern (template)
    
    * This pattern has no NICK, is multi-word last name,
    * and has 2 permutations.
    * 1. Middle as part of Last name.
    * 2. Middle is separate from Last name.
    
    ^
      [ ]?
    
       # First (req'd)
        ([^\ ,()"']+)              # (1) first name
    
       # Preferred first
        (?: [ ]
           (                       # (2) (preferred), -or-
             \( ([^)]*?) \)        # (3)  preferred
           )
        )?  
    
       # Multi Last (req'd)
        [ ]
        (                         # (4) Multi, as Middle + Last,
                                  # -or-
           (?:                         # Middle
              (                        # (5) full middle, -or-
                 ([^\ ,()"'])          # (6) initial
                 [^\ ,()"']*
              )
              [ ]
           )
                                       # Last (req'd)
           (                           # (7) multi/single word last name
              [^\ ,()"']+
              (?:[ ][^\ ,()"']+)* 
           )
        )
    
       # Suffix 
        (?: [ ]? , [ ]? (.*?) )?   # (8) suffix
    
      [ ]?
    $
    
    -----------------------------------
    
    Each of these regexes are mutually exclusive and should be checked
    in an if-then-else like this (Pseudo code):
    
    str_Normal = rxDot.Replace(rxWSp.Replace(str, " "), "")
    
    If  Nick_rx.Test(str_Normal) Then
         N_1a  = rxWSp.Replace( Nick_rx.Replace(str_Normal, "$9 $10 , $1 $2 $4 $6 "), " ")
         N_1aa = rxWSp.Replace( Nick_rx.Replace(str_Normal, "$9 $10 , $1 $3 $4 $8 "), " ")
         N_1b  = rxWSp.Replace( Nick_rx.Replace(str_Normal, "$9 $10 , $1 $2 $5 $6 "), " ")
         N_1bb = rxWSp.Replace( Nick_rx.Replace(str_Normal, "$9 $10 , $1 $3 $5 $8 "), " ")
         N_2   = rxWSp.Replace( Nick_rx.Replace(str_Normal, "$9 $10 , $1 $5 "), " ")
    
         ' see test case results in output below
    Else
    
    If FLs_rx.Test(str_Normal) Then
    
         FLs_1a  = rxWSp.Replace( FLs_rx.Replace(str_Normal, "$4 $5 , $1 $2 "), " ")
         FLs_1aa = rxWSp.Replace( FLs_rx.Replace(str_Normal, "$4 $5 , $1 $3 "), " ")
         FLs_2   = rxWSp.Replace( FLs_rx.Replace(str_Normal, "$4 $5 , $1 "), " ")
    
    Else
    
    If FLm_rx.Test(str_Normal) Then
    
       ' Permutation 1:
         FLm1_1a  = rxWSp.Replace( FLm_rx.Replace(str_Normal, "$4 $8 , $1 $2 "), " ")
         FLm1_1aa = rxWSp.Replace( FLm_rx.Replace(str_Normal, "$4 $8 , $1 $3 "), " ")
         FLm1_2   = rxWSp.Replace( FLm_rx.Replace(str_Normal, "$4 $8 , $1 "), " ")
    
      ' Permutation 2:
         FLm2_1a  = rxWSp.Replace( FLm_rx.Replace(str_Normal, "$7 $8 , $1 $2 $5 "), " ")
         FLm2_1aa = rxWSp.Replace( FLm_rx.Replace(str_Normal, "$7 $8 , $1 $3 $5 "), " ")
         FLm2_1b  = rxWSp.Replace( FLm_rx.Replace(str_Normal, "$7 $8 , $1 $2 $6 "), " ")
         FLm2_1bb = rxWSp.Replace( FLm_rx.Replace(str_Normal, "$7 $8 , $1 $3 $6 "), " ")
         FLm2_2   = rxWSp.Replace( FLm_rx.Replace(str_Normal, "$7 $8 , $1 $6 "), " ")
    
      ' At this point, the odds are that only one of these permutations will match 
      ' a different column.
    
    Else
    
         ' The data could not be matched against a valid form
    End If
    
    -----------------------------
    
    Test Cases
    
    Found form 'Nick'
    Input (raw)                 = 'John1 (JJ) Bert "nick" St Van Helsing ,Jr '
    Normal                      = 'John1 (JJ) Bert "nick" St Van Helsing ,Jr '
    
    Out type 1a  full middle    = 'St Van Helsing Jr , John1 (JJ) Bert "nick" '
    Out type 1aa full middle    = 'St Van Helsing Jr , John1 JJ Bert nick '
    Out type 1b  middle initial = 'St Van Helsing Jr , John1 (JJ) B "nick" '
    Out type 1bb middle initial = 'St Van Helsing Jr , John1 JJ B nick '
    Out type 2   middle initial = 'St Van Helsing Jr , John1 B '
    
    =======================================================
    
    Found form 'Nick'
    Input (raw)                 = 'John2 Bert "nick" Helsing ,Jr '
    Normal                      = 'John2 Bert "nick" Helsing ,Jr '
    
    Out type 1a  full middle    = 'Helsing Jr , John2 Bert "nick" '
    Out type 1aa full middle    = 'Helsing Jr , John2 Bert nick '
    Out type 1b  middle initial = 'Helsing Jr , John2 B "nick" '
    Out type 1bb middle initial = 'Helsing Jr , John2 B nick '
    Out type 2   middle initial = 'Helsing Jr , John2 B '
    
    =======================================================
    
    Found form 'Nick'
    Input (raw)                 = 'John3 Bert "nick" St Van Helsing ,Jr '
    Normal                      = 'John3 Bert "nick" St Van Helsing ,Jr '
    
    Out type 1a  full middle    = 'St Van Helsing Jr , John3 Bert "nick" '
    Out type 1aa full middle    = 'St Van Helsing Jr , John3 Bert nick '
    Out type 1b  middle initial = 'St Van Helsing Jr , John3 B "nick" '
    Out type 1bb middle initial = 'St Van Helsing Jr , John3 B nick '
    Out type 2   middle initial = 'St Van Helsing Jr , John3 B '
    
    =======================================================
    
    Found form 'First-Last (single)'
    Input (raw)                 = 'John4 Helsing '
    Normal                      = 'John4 Helsing '
    
    Out type 1a  no middle      = 'Helsing  , John4  '
    Out type 1aa no middle      = 'Helsing  , John4  '
    Out type 2                  = 'Helsing  , John4 '
    
    =======================================================
    
    Found form 'First-Last (single)'
    Input (raw)                 = 'John5 (JJ) Helsing '
    Normal                      = 'John5 (JJ) Helsing '
    
    Out type 1a  no middle      = 'Helsing  , John5 (JJ) '
    Out type 1aa no middle      = 'Helsing  , John5 JJ '
    Out type 2                  = 'Helsing  , John5 '
    
    =======================================================
    
    Found form 'First-Last (multi)'
    Input (raw)                 = 'John6 (JJ) Bert St Van Helsing ,Jr '
    Normal                      = 'John6 (JJ) Bert St Van Helsing ,Jr '
    
    Permutation 1:
    Out type 1a  no middle      = 'Bert St Van Helsing Jr , John6 (JJ) '
    Out type 1aa no middle      = 'Bert St Van Helsing Jr , John6 JJ '
    Out type 2                  = 'Bert St Van Helsing Jr , John6 '
    Permutation 2:
    Out type 1a  full middle    = 'St Van Helsing Jr , John6 (JJ) Bert '
    Out type 1aa full middle    = 'St Van Helsing Jr , John6 JJ Bert '
    Out type 1b  middle initial = 'St Van Helsing Jr , John6 (JJ) B '
    Out type 1bb middle initial = 'St Van Helsing Jr , John6 JJ B '
    Out type 2   middle initial = 'St Van Helsing Jr , John6 B '
    
    =======================================================
    
    Found form 'First-Last (multi)'
    Input (raw)                 = 'John7 Bert St Van Helsing ,Jr '
    Normal                      = 'John7 Bert St Van Helsing ,Jr '
    
    Permutation 1:
    Out type 1a  no middle      = 'Bert St Van Helsing Jr , John7 '
    Out type 1aa no middle      = 'Bert St Van Helsing Jr , John7 '
    Out type 2                  = 'Bert St Van Helsing Jr , John7 '
    Permutation 2:
    Out type 1a  full middle    = 'St Van Helsing Jr , John7 Bert '
    Out type 1aa full middle    = 'St Van Helsing Jr , John7 Bert '
    Out type 1b  middle initial = 'St Van Helsing Jr , John7 B '
    Out type 1bb middle initial = 'St Van Helsing Jr , John7 B '
    Out type 2   middle initial = 'St Van Helsing Jr , John7 B '
    
    =======================================================
    
    Form  ***  (unknown)
    Input (raw)                 = ' do(e)s not. match ,'
    Normal                      = ' do(e)s not match ,'
    
    =======================================================