Search code examples
c++unicodeicu

Checking if a unicode quotation mark is opening or closing


I am writing a lexer that needs to find the boundaries of strings. These strings may be quoted using Unicode characters (i.e. left and right double quotation marks) that can differentiate things such as strings-within-strings.

To test Unicode character properties I am using the ICU4C library.

I currently test for quotation marks using u_hasBinaryProperty(cp, UCHAR_QUOTATION_MARK).

This works well for finding the quotation marks themselves, but falls short in being able to tell an open-quote from a close-quote.

Is there some property value I can test or other functionality I can use to test the orientation of the quote mark without explicitly testing it against each possible type of quote?


Solution

  • Whether a quotation mark is opening or closing is language/locale dependent and therefore out of scope for Unicode. Unicode deals with scripts only. Examples:

    • «Swiss»
    • »Polish«
    • »Finnish»

    Look elsewhere.