Search code examples
c#regexconsole

How to read RegEx Captures in C#


I want to ask a user for their phone number in the console, check it against a RegEx, then capture the digits so I can format them the way I want. I've got all that working except the RegEx capture part. How do I get the capture values into C# variables?

static void askPhoneNumber()
{
    String pattern = @"[(]?(\d{3})[)]?[ -.]?(\d{3})[ -.]?(\d{4})";

    System.Console.WriteLine("What is your phone number?");
    String phoneNumber = Console.ReadLine();

    while (!Regex.IsMatch(phoneNumber, pattern))
    {
        Console.WriteLine("Bad Input");
        phoneNumber = Console.ReadLine();
    }

    Match match = Regex.Match(phoneNumber, pattern);
    Capture capture = match.Groups.Captures;

    System.Console.WriteLine(capture[1].Value + "-" + capture[2].Value + "-" + capture[3].Value);
}

Solution

  • The C# regex API can be quite confusing. There are groups and captures:

    • A group represents a capturing group, it's used to extract a substring from the text
    • There can be several captures per group, if the group appears inside a quantifier.

    The hierarchy is:

    • Match
      • Group
        • Capture

    (a match can have several groups, and each group can have several captures)

    For example:

    Subject: aabcabbc
    Pattern: ^(?:(a+b+)c)+$
    

    In this example, there is only one group: (a+b+). This group is inside a quantifier, and is matched twice. It generates two captures: aab and abb:

    aabcabbc
    ^^^ ^^^
    Cap1  Cap2
    

    When a group is not inside of a quantifier, it generates only one capture. In your case, you have 3 groups, and each group captures once. You can use match.Groups[1].Value, match.Groups[2].Value and match.Groups[3].Value to extract the 3 substrings you're interested in, without resorting to the capture notion at all.