Search code examples
clocalestandard-library

Locale-invariant string processing with strtod strtof atof printf?


Are there any plans for adding versions of C standard library string processing functions that are invariant under current locale?

Currently there are lots of fragile workarounds, for example, from jansson/strconv.c:

static void to_locale(strbuffer_t *strbuffer)
{
    const char *point;
    char *pos;

    point = localeconv()->decimal_point;
    if(*point == '.') {
        /* No conversion needed */
        return;
    }

    pos = strchr(strbuffer->value, '.');
    if(pos)
        *pos = *point;
}

static void from_locale(char *buffer)
{
    const char *point;
    char *pos;

    point = localeconv()->decimal_point;
    if(*point == '.') {
        /* No conversion needed */
        return;
    }

    pos = strchr(buffer, *point);
    if(pos)
        *pos = '.';
}

These functions preprocess its input so it can be used independent of the current locale, under the assumption

  1. That the delimiter is one byte
  2. No call to setlocale happens between these fix function and the call to any of the affected functions
  3. The string can be modified before conversion

(1) implies that the preprocessing approach breaks on exotic locales (see https://en.wikipedia.org/wiki/Decimal_mark#Hindu.E2.80.93Arabic_numeral_system for examples). (2) implies that the preprocessing approach cannot be threadsafe without a lock, and that lock must be added to the C library. (3) Just stupid.

If it were only possible to specify the locale for a single call to a string-processing function as a parameter, not affecting any other threads, none of these restrictions would apply.

Questions:

  1. Are there any reports to WG14, or WG21 that address this defect?
  2. If so, why hasn't these been merged into the standard? It would be nothing more than a new set of functions that take a locale as argument.
  3. What is the canonical workaround?

Update:

After searching through the Internet, I found the *_l functions, available on FreeBSD, GNU/Linux and MacOSX. Similar functions exists on Windows also. These solve my problem, however these are not in POSIX, which is a superset of C (not really, POSIX relaxes on pointers). So questions 1, and 2 remains open.


Solution

  • BSD and macOS Sierra (and Mac OS X before it) support _l functions that allow you to specify the locale, rather than relying on the current locale. For example:

    int
    fprintf_l(FILE * restrict stream, locale_t loc, const char * restrict format, ...);
    
    int
    printf_l(locale_t loc, const char * restrict format, ...);
    
    int
    snprintf_l(char * restrict str, size_t size, locale_t loc, const char * restrict format, ...);
    
    int
    sprintf_l(char * restrict str, locale_t loc, const char * restrict format, ...);
    

    and:

    int
    fscanf_l(FILE * restrict stream, locale_t loc, const char * restrict format, ...);
    
    int
    scanf_l(locale_t loc, const char * restrict format, ...);
    
    int
    sscanf_l(const char * restrict str, locale_t loc, const char * restrict format, ...);
    

    As a general design, this seems sensible. The type locale_t is not part of Standard C but is part of POSIX (and defined in <locale.h> there), and used in <ctype.h> amongst other places. The BSD man pages say that the header to use is <xlocale.h> rather than <locale.h>; this would perhaps be fixed by the standard. Unless there is a major flaw in the design of the BSD functions, these should be a very good basis for any standardization effort, whether that was under POSIX or Standard C.

    One issue with the BSD design might be that the locale_t structure is passed by value, not by (constant restricted) pointer, which is a little surprising. However, it is consistent with the POSIX functions such as:

    int   isalpha_l(int, locale_t);
    

    A similar scheme might be devised for handling time zone settings, too. There'd be more work in setting that up since there isn't already a time zone type (whereas the locale_t is part of POSIX already — and could probably be adopted without change into standard C). But, combined with locale settings, it could make the time routines more easily usable in diverse environments from a single executable.