REGEX Name and any surname

in the example below, I want to make 2 groups in a regex:

Name FirtSurname SecondSurname ..

The first group would be Name

The second FirtSurname SecondSurname ...

^(\w+)(.*)$   - would capture all
\w+           - would make n groups (number of words).

I want only 2 groups. First name and anything that follows on another.

Any help?

Solution

First, as someone with punctuation in my given name :-) PLEASE don't use \w to try to match names :-) … both - and ' are not uncommon.

Using Perl, for example:

  if ("Bruce-Robert Fenn Pocock" =~ /^(\w+)(.*)$/) { print "First: $1    Rest: $2" }

  → First: Bruce    Rest: -Robert Fenn Pocock

Perhaps just group all non-space characters, then skip the first occurrence of whitespace:

  if ("Bruce-Robert Fenn Pocock" =~ /^(\S+)\s*(.*)$/) { print "First: $1    Rest: $2" }

  → First: Bruce-Robert    Rest: Fenn Pocock

Of course, if you run across people with middle names in your dataset, there's no way to tell them apart from matronym-patronym pairs or multi-part last names.

I hope/assume you don't have honorifics in your input, either.

First: Don         Rest: Juan de la Mancha
     *** wrong: Don is honorific
First: Diego       Rest: de la Vega
First: John        Rest: Jacob Smith
     *** wrong: Jacob is probably a middle name
First: De'shawna   Rest: Cummings
First: Wehrner     Rest: von Braun
First: Oscar       Rest: Vazquez-Oliverez

Ultimately, the only way to accurately break down a name into an honorific, given name, middle name(s), surnames (matronym, patronym), and suffix(es), is to ask.

(EG. my own name, in Anglo circles, the "Fenn" is considered a "middle name," in Latino circles, it's interpreted as a matronym.)

Honorifics and suffices can often be guessed-at from a list, but e.g. military titles and doctoral suffices are a long list ("Dr John Doe, Pharm.D", "Maj. Gen. Thomas Ts'o") and not unambiguous (e.g. "Don" is both a short form of "Donald" and an honorific).

PS. Lovely article here:

http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/