Search code examples
c++cncursescurses

Extracting wide chars w/ attributes in ncurses


[Please note I am using _XOPEN_SOURCE_EXTENDED 1 and setlocale(LC_CTYPE, "").]

Curses includes various functions for extracting characters from the screen; they can be divided into those which grab just the text and those which grab the text plus attributes (bold, color, etc.). The former use wchar_t (or char) and the latter curses' own chtype.

There are constants to mask a chtype to get just the character or just the attributes -- A_CHARTEXT and A_ATTRIBUTES. However, from the value of these, it is easy to see that there will be collisions with wchar_t values over 255. A_ATTRIBUTES is 64-bits and only the lower 8 are unset.

If the base type internally is chtype, this would mean ncurses was unworkable with most of unicode, but it isn't -- you can use hardcoded strings in UTF-8 source and write them out with attributes no problem. Where it gets interesting is getting them back again.

wchar_t s[] = "\412";

This character has a value of 266 and displays as Ċ. However, when extracted into a chtype using, e.g., mvwinchnstr(), it is exactly the same as a space (10) with the COLOR_PAIR(1) attribute (256) set. And in fact, if you take the extracted chtype and redisplay it, you get just that -- a space with COLOR_PAIR(1) set.

But if you extract it instead into a wchar_t with, e.g. mvwinnwstr(), it's correct, as is a colored space. The problem with this, of course, is that the attributes are gone. This implies the attributes are being masked out correctly, which is demonstrably impossible with a chtype, since a chtype for both of these has the same value (266). In other words, the internal representation is obviously niether a chtype nor a wchar_t.

I do not use ncurses much, and I notice there are other curses implementations (e.g. Oracle's) with functions that imply the chtype there might not have this problem. In any case, is there a way w/ ncurses to unambiguously extract wide chars together with their attributes?

[I've tagged this C and C++ since it is applicable in both contexts.]


Solution

  • It is more complicated than that. But briefly:

    • In the SVr4 implementation, there was just chtype.
    • X/Open work for standardization added on the multibyte characters, represented in cchar_t.
    • Not blatantly obvious in the X/Open documentation, but seen in the corresponding Unix implementations, the chtype and cchar_t were not envisioned as possibly different views of the same data. You can only make 8-bit encodings with the former.
    • Not many applications really delve into Unix implementations to make it apparent (in fact, at least one vendor's XPG4 implementation never worked well enough to do useful testing — so much for the state of the art).
    • The integration (or lack of same) was overlooked in ncurses, where it seemed a natural thing to do.
    • ncurses accepts multibyte strings in addstr (none of the Unix's do).
    • ncurses attempts to provide the same information via either style of interface which was set via the other.
    • There are obviously limitations: chtype corresponds to a single cell on the screen, and can hold only an 8-bit character. Interfaces such as winnstr which return a string will work within that constraint. The winchnstr function does return an array of chtype values.
    • If you want the attributes for a cell which is not an 8-bit character, you are best off by retrieving it via the analogous win_wchnstr