I am writing a lexer that needs to find the boundaries of strings. These strings may be quoted using Unicode characters (i.e. left “
and right ”
double quotation marks) that can differentiate things such as strings-within-strings.
To test Unicode character properties I am using the ICU4C library.
I currently test for quotation marks using u_hasBinaryProperty(cp, UCHAR_QUOTATION_MARK)
.
This works well for finding the quotation marks themselves, but falls short in being able to tell an open-quote from a close-quote.
Is there some property value I can test or other functionality I can use to test the orientation of the quote mark without explicitly testing it against each possible type of quote?
Whether a quotation mark is opening or closing is language/locale dependent and therefore out of scope for Unicode. Unicode deals with scripts only. Examples:
Look elsewhere.