Search code examples
c++locale

When should I use a derived class facet in lieu of a base class facet?


There are a few standard base class facets in the C++ Standard library whose default behavior is dependent on the classic "C" locale (std::locale::classic()). This would make it reasonable to switch to derived class facets (aka byname facets) whose behavior depends on the locale specified at its construction if your program requires culturally-specific functionality.

For example, std::ctype provides classic "C" character classification:

§22.4.1.3.3

   static const mask* classic_table() noexcept;

Returns: A pointer to the initial element of an array of size table_size which represents the classifications of characters in the "C" locale

Does this mean that the behavior of std::ctype is functionally distinct from that of the locale to which it is installed? For instance, say I have a Japanese locale:

std::locale loc("ja_JP");

and I wanted to use a facet that performed character classification on Japanese characters. Character classification is what std::ctype is for:

auto& f = std::use_facet<std::ctype<char>>(loc);

Will f's ctype methods classify characters based on the Japanese locale, or the classic "C" one? My first guess is the "C" locale based on the Standard quote above, but in fact it is the Japanese locale. I'm wondering why the quote doesn't agree with what is happening here.

Here are my questions:

  • Why does the Standard say that ctype performs "C" character classification when ctype actually classifies based on the locale with which it is being used?

  • Since the above is true, where do derived class facets come in? Why should I use a derived class facet when the base class already uses the locale I want?


Solution

  • Only the default-constructed std::ctype<char> facet uses classic_table for its classification. The facet obtained from the system-provided "ja_JP" is not an example of that.

    When talking about derived facets, people generally refer to user-defined facets that are derived from std::ctype and the like, not the system-providedbyname facets. You may use a derived ctype facet if you want to redefine some character class, for example, to treat commas as whitespace to parse a comma-separated input stream, or stop treating spaces and tabs as whitespace, to parse a stream line-by-line.