Can someone please help me with this?
I'm trying to match roman numerals with a "." at the end and then a space and a capital letter after the point. For example:
I. And here is a line.
II. And here is another line.
X. Here is again another line.
So, the regex should match the "I. A"
, "II. A"
and "X. H"
.
I did this "^(XC|XL|L?X{0,3})(IX|IV|V?I{0,3}){1,4}\.\s[A-Z]"
But the problem is that this RegEx is also matching with ". A"
and i don't want it.
In resume it should have at least one roman numeral, followed by a "."
and then a space and a capital letter.
You need a (?=[LXVI])
lookahead at the start that would require at least one Roman number letter at the start of the string:
^(?=[LXVI])(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\.\s[A-Z]
# ^^^^^^^^^
See the regex demo. Not sure why you used {1,4}
, I suggest removing it.
Another workaround here would be to use a word boundary right after ^
:
^\b(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\.\s[A-Z]
#^^
This would disallow a match where .
appears at the start since \b
, required at the same position as the start of string, requires that the next char must be a word char (and here, it must be a Roman number).
Regarding \.\s[A-Z]
, you may enhance it you add +
or *
after \s
, and if you ever need to match it and exclude from a match, turn it into a positive lookahead, (?=\.\s+[A-Z])
or (?=\.\s*[A-Z])
.