Search code examples
c#regexlookbehind

How to match values ending with an optional string through Regex?


I am trying to extract a first name from a text snippet, which optionally has a last name in the same line as: <first_name>name<last_name>

E.g.:

Text: JohnnameSnow -> Result: John
Text: John -> Result: John

So I want to extract the <first_name> part from that line, but if there is no name<last_name> it should return the full line.

I have tried the following Regex:

([A-zÀ-ÿ-]{2,})(?=(?:name))

That works fine if there's actually a last name in the same line, but does not return me the full line when there is not. Unfortunately the solution doesn't seem to be as easy as adding |$.

Can I look for an optional end word and ignore it if it does not occur?


Solution

  • You can use

    ^(?<first>\p{L}+?)(?:name(?<last>\p{L}+))?$
    

    See the regex demo. Output:

    enter image description here

    Details

    • ^ - start of string
    • (?<first>\p{L}+?) - Group "first": one or more letters, but as few as possible
    • (?:name(?<last>\p{L}+))? - an optional non-capturing group:
      • name - a substring
      • (?<last>\p{L}+) - Group "last": one or more letters
    • $ - end of string.

    See the C# demo:

    var strings = new List<string> { "JohnnameSnow", "John" };
    foreach (var s in strings)
    {
        Console.WriteLine(s);
        var m = Regex.Match(s, @"^(?<first>\p{L}+?)(?:name(?<last>\p{L}+))?$");
        if (m.Success) 
        {
            Console.WriteLine("First name: {0}, Last name = {1}", m.Groups["first"].Value, m.Groups["last"].Value);
        }
        else
        {
            Console.WriteLine("No match!");
        }
    }
    

    Output:

    JohnnameSnow
    First name: John, Last name = Snow
    John
    First name: John, Last name =