regexregex-group

Having an unknown problem with Regex processing names


I'm parsing name strings that have strange compound formations. The current formation that's giving me a problem is these names:

Edward St. Loe Livermore
Henry St. George Tucker III
Henry St. John

This pattern (.*)(St\.\s\w+)\s(.*) parses the first two names and completely ignores the third.

This pattern (.*)(St\.\s\w+)|(St\.\s\w+\s(.*))$ returns the third name as well, but leaves off the surname of the first two.

I'm using this save https://regex101.com/ to test the regex pattern

So far I can't figure out what pattern will return the surname in the match for all three names, or if I need to do conditional statement in my code to parse the three element names separately, which seems inefficient.

TIA


Solution

  • Use this regex:

    (.*)(St\.\s\w+)\s*.*
    

    Online Demo

    The regular expression matches as follows:

    Node Explanation
    ( group and capture to \1:
    .* any character except \n (0 or more times (matching the most amount possible))
    ) end of \1
    ( group and capture to \2:
    St 'St'
    \. .
    \s whitespace (\n, \r, \t, \f, and " ")
    \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible))
    ) end of \2
    \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible))
    .* any character except \n (0 or more times (matching the most amount possible))