Search code examples
c++unicodestringinternationalizationicu

ICU Unicode Normal vs Fullwidth


I am somewhat new to unicode and unicode strings. I'm trying to determine the difference between "fullwidth" symbol and a normal one.

Take these two for example:

Normal: http://www.fileformat.info/info/unicode/char/20a9/index.htm

Fullwidth: http://www.fileformat.info/info/unicode/char/ffe6/index.htm

I notice that the fullwidth is defined as U+20A9 and coincidentally 20A9 is the normal one. So what is the value of U?

When using libraries like ICU is there a way to specify always return normal versus full?

Thanks,


Solution

  • U+number is a notational convention for a Unicode code point. There is no 'value' of U.

    U+0020, for example, is a space. The value in memory is 32 decimal, 20 hex.

    Full width characters are a whole other story.

    Back in the days of the 3270, Hanzi took up two positions in memory in the display. So they also took up two columns on the screen. To make things line up neatly, IBM defined a set of 'full-width' (better would have been 'double-width') letters and numbers.

    If some ICU API is delivering full-width, you can use the Normalizer to get rid of it. You might also post a ticket to their ticket system, this seems odd.