I am trying to extract a first name from a text snippet, which optionally has a last name in the same line as: <first_name>name<last_name>
E.g.:
Text: JohnnameSnow -> Result: John
Text: John -> Result: John
So I want to extract the <first_name>
part from that line, but if there is no name<last_name>
it should return the full line.
I have tried the following Regex:
([A-zÀ-ÿ-]{2,})(?=(?:name))
That works fine if there's actually a last name in the same line, but does not return me the full line when there is not. Unfortunately the solution doesn't seem to be as easy as adding |$
.
Can I look for an optional end word and ignore it if it does not occur?
You can use
^(?<first>\p{L}+?)(?:name(?<last>\p{L}+))?$
See the regex demo. Output:
Details
^
- start of string(?<first>\p{L}+?)
- Group "first": one or more letters, but as few as possible(?:name(?<last>\p{L}+))?
- an optional non-capturing group:
name
- a substring(?<last>\p{L}+)
- Group "last": one or more letters$
- end of string.See the C# demo:
var strings = new List<string> { "JohnnameSnow", "John" };
foreach (var s in strings)
{
Console.WriteLine(s);
var m = Regex.Match(s, @"^(?<first>\p{L}+?)(?:name(?<last>\p{L}+))?$");
if (m.Success)
{
Console.WriteLine("First name: {0}, Last name = {1}", m.Groups["first"].Value, m.Groups["last"].Value);
}
else
{
Console.WriteLine("No match!");
}
}
Output:
JohnnameSnow
First name: John, Last name = Snow
John
First name: John, Last name =