I have developed a regex to use in a .NET WebAPI that gets a date and a control code from a given input already formatted in final format.
I tried regex to avoid using multiple string splits.
I've been using Regex101 to test my expression and I have one that already works as expected by I think it's too large for what it does.
Expression:
^([0-9]{2})+([0-9]{2})+([0-9]{2})[0-9](M|F)([0-9]{2})+([0-9]{2})+([0-9]{2})
// Get in format Year, Month, Day, Code(M|F), Year, Month, Day
Input:
7603259M2209058PRT<<<<<<<<<<<8
Do you have any suggestions to simplify it?
There is one issue with your regex: you quantified the two-digit matching capturing groups with a +
quantifier, making them match one or more times. ([0-9]{2})+
matches one or more sequences of any two ASCII digits, while keeping the last captured value in the corresponding group. See Repeating a Capturing Group vs. Capturing a Repeated Group.
You need to remove all +
chars from your pattern and then you can also use the following:
\d
to match any digit while passing the RegexOptions.ECMAScript
option to the regex compile method so that it can only match ASCII digits (otherwise, \d
will be equal to \p{Nd}
and will match any Unicode digits, see \d less efficient than [0-9])(M|F)
), use a character class, ([MF])
, this is more efficient (see Why is a character class faster than alternation?).You can use
var pattern = new Regex(@"^(\d{2})(\d{2})(\d{2})\d([MF])(\d{2})(\d{2})(\d{2})", RegexOptions.ECMAScript);
See the .NET regex demo.
If you want to use and even shorter regex you may use:
var pattern = new Regex(@"^(?:(\d{2})){3}\d([MF])(?:(\d{2})){3}", RegexOptions.ECMAScript);
var match = pattern.Match("7603259M2209058PRT<<<<<<<<<<<8");
if (match.Success)
{
Console.WriteLine(match.Groups[1].Captures[0].Value); // => 76
Console.WriteLine(match.Groups[1].Captures[1].Value); // => 03
Console.WriteLine(match.Groups[1].Captures[2].Value); // => 25
Console.WriteLine(match.Groups[2].Value); // => M
Console.WriteLine(match.Groups[3].Captures[0].Value); // => 22
Console.WriteLine(match.Groups[3].Captures[1].Value); // => 09
Console.WriteLine(match.Groups[3].Captures[2].Value); // => 05
}
See the C# demo and this regex demo.
Note this is possible because .NET Regex
allows access to all the captures inside the group stack.