I'm trying to match a number written as a word, digit or roman numeral. Here's a bunch of samples
CHAPTER 1
CHAPTER 2
CHAPTER THREE
CHAPTER IV
CHAPTER TWENTY TWO
I'm pretty bad at regex, here's what I've got so far.
(CHAPTER (([0-9]+)|(/* words - see below */)|( /* roman - see below */)))
// words
(TWENTY|THIRTY|etc)?( |-)?(ONE|TWO|THREE|FOUR|FIVE|etc)?
// roman
(I|II|III|IV|V|etc)+
The statement catches CHAPTER 1, CHAPTER 2 and CHAPTER THREE, but tries to match IV as a word (I'm guessing its matching FIVE somehow?). TWENTY TWO Doesn't match at all.
Can anyone help? Here's the full regex
(CHAPTER (
([0-9]+)|
((TWENTY|THIRTY)?( |-)?(ONE|TWO|THREE|FOUR|FIVE)?)|
((I|II|III|IV|V)+)
))
NOTE:
The point of this is to convert these text representations to actual integers. I have methods to do this in each case, so I do need to distinguish between the various cases
Since you've already got parsers, which hopefully fail gracefully if given something which superficially looks like valid roman/text input but isn't, you could just call them all and see which pass.
If you don't just want to call them all, this regex should identify which parser to pass each input to.
var re = new Regex(
@"CHAPTER (?:(?<arabic>\d+)|(?<roman>[IVXLCDM]+)|(?<text>[A-Z ]+))");
called for example as
var input = @"CHAPTER 1
CHAPTER 2
CHAPTER THREE
CHAPTER IV
CHAPTER TWENTY TWO";
foreach (Match match in re.Matches(input))
{
if (match.Groups["arabic"].Success)
{
Console.WriteLine("Pass {0} to Arabic parser", match.Groups["arabic"].Value);
}
else if (match.Groups["roman"].Success)
{
Console.WriteLine("Pass {0} to Roman parser", match.Groups["roman"].Value);
}
else if (match.Groups["text"].Success)
{
Console.WriteLine("Pass {0} to Text parser", match.Groups["text"].Value);
}
}
results in
Pass 1 to Arabic parser
Pass 2 to Arabic parser
Pass THREE to Text parser
Pass IV to Roman parser
Pass TWENTY TWO to Text parser