Search code examples
unicodeg-wan

How to handle Unicode text with C/C++ servlets/handlers in G-Wan Web Server?


I'm planning to write a web application using C/C++ servlets/handlers for G-Wan web/app server. I would like my application to work with multiple languages including multibyte characters and hence am wondering how i should handle this in G-WAN servlets.

The xbuf_t structure seems to be char* as its underlying storage buffer for building HTTP response; and since char is a single byte, i would want to know how it would affect the text with unicode or multi-byte characters. I'm a bit reluctant to add heavy unicode libraries like IBM Unicode Library [ICU] and the likes.

Could someone explain me how others are dealing with this situation and if required what options are available for handling unicode, preferably with as little and small dependencies as possible?


Solution

  • The server response (called reply in servlet examples) can contain binary data so this is possible of course. There are examples that send dynamically pictures (GIF, PNG, JSON, etc.), so there's no limit to what you can send as a reply.

    Without UNICODE, you are using xbuf_xcat() which acts like sprintf() with a dynamically growing buffer (the server reply).

    What you should do is just build your UNICODE reply (with your favorite UNICODE library - ANSI C and almost all languages have one) and then copy it into the reply buffer with xbuf_ncat();

    Of course, you can also use xbuf_ncat(); on-the-fly for each piece of data you build rather than for all the big buffer at the end of your servlet. Your choice.

    Note that using UTF-8 may be (it depends on your application) a better choice than UNICODE because then most of your text might be able to use xbuf_xcat() (this is faster than a buffer copy).

    You will only need to call xbuf_ncat(); for the non-ASCII characters.

    The xbuf_xxx() functions could be modified to support UTF-8/UNICODE (with a flag to tell which encoding is used for example) but this will be for later.