Search code examples
javaregexgroovymatchroman-numerals

Match several occurrences or zero (in this order) using regular expressions


I want to match roman numbers using Groovy regular expressions (I have not tried this in Java but should be the same). I found an answer in this website in which someone suggested the following regex:

/M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})/

The problem is that a expression like /V?I{0,3}/ is not greedy in Groovy. So for a string like "Book number VII" the matcher /V?I{0,3}/ returns "V" and not "VII" as it would be desired.

Obviously if we use the pattern /VI+/ then we DO get the match "VII"... but this solution is not valid if the string is something like "Book number V" as we will get no matches...

I tried to force the maximum character catching by using a greedy quantifier /VI{0,3}+/ or even /VI*+/ but I still get the match "V" over "VII"

Any ideas?


Solution

  • Why not just (IX|IV|V?I{1,3}|V) ?