Search code examples
c#regexfilehelpers

C# regular expression trouble


Problem!

I Have the following input (rules) from a flat file (talking about numeric input):

  • Input might be a natural number (below 1000): 1, 10, 100, 999, ...
  • Input might be a comma separated number surrounded by quotes (above 1000): "1,000", "2,000", "3,000", "10,000", ...

I Have the following regular expression to validate the input: (?:(\d+)|\x22([0-9]+(?:,[0-9]+)*)\x22), So for an input like 10 I'm expecting in the first matching group 10, which is exactly what I got. But when I got an input like "10,000" I'm expecting in the first matching group 10,000, but it is stored at the second matching group.

Example

string text1 = "\"" + "10,000" + "\"";
string text2 = "50";

string pattern = @"(\d+)|\x22([0-9]+(?:,[0-9]+){0,})\x22";

Match match1 = Regex.Match(text1, pattern);
Match match2 = Regex.Match(text2, pattern);

if (match1.Success)
{
    Console.WriteLine("Match#1 Group#1: " + match1.Groups[1].Value);
    Console.WriteLine("Match#1 Group#2: " + match1.Groups[2].Value);

    # Outputs
    # Match#1 Group#1: 
    # Match#1 Group#2: 10,000
}

if (match2.Success)
{
    Console.WriteLine("Match#2 Group#1: " + match2.Groups[1].Value);
    Console.WriteLine("Match#2 Group#2: " + match2.Groups[2].Value);

    # Outputs
    # Match#2 Group#1: 50
    # Match#2 Group#2: 
}

Expected Result

Both results on the same matching group, in this case 1

Questions?

  • What am I doing wrong? I'm just getting bad grouping from the regular expression matches.
  • Also, I'm using filehelpers .NET to parse the file, is there any other way to resolve this problem. Actualy I'm trying to implement a custom converter.

Object File

[FieldConverter(typeof(OOR_Quantity))]
public Int32 Quantity;

OOR_Quantity

internal class OOR_Quantity : ConverterBase
{
    public override object StringToField(string from)
    {
        string pattern = @"(?:(\d+)|\x22([0-9]+(?:,[0-9]+)*)\x22)";
        Regex regex = new Regex(pattern);

        if (regex.IsMatch(from))
        {
            Match match = regex.Match(from);
            return int.Parse(match.Groups[1].Value);
        }

        throw new ...
    }
}

Solution

  • Group numbers are assigned purely on the basis of their positions in the regex--specifically, the relative position of the opening bracket, (. In your regex, (\d+) is the first group and ([0-9]+(?:,[0-9]+)*) is the second.

    If you want to refer to them both with the same identifier, use named groups and give them both the same name:

    @"(?:(?<NUMBER>\d+)|\x22(?<NUMBER>[0-9]+(?:,[0-9]+)*)\x22)"
    

    Now you can retrieve the captured value as match.Groups["NUMBER"].Value.