Search code examples
clocalefwriteutf

Printing to the terminal in a encoding neutral manner


I would like to print a string to the screen regardless of its encoding (UTF-8,16,32). This string is represented in a char array so I need to ignore null bytes and carry on printing to stdout; this puts printf family and friends out of the question.

char text[] = { 0x00, 0x55, 0x00, 0x6E, 0x00, 0x69, 0x00, 0x63, 0x00, 0x6F, 0x00, 0x64, 0x00, 0x65 };

fwrite( text, sizeof(char), sizeof(text), stdout );

To this end I've chosen the above solution to give me the ability to print all UTF encoding formats. I understand that certain terminals will not display the characters correctly but that is not my concern as its a configurable option outside of the application.

My application has settings on which message catalogue to load (en_EN.UTF-8, etc..) however I want to avoid having to do string conversion in the code based on the currently selected locale.

Could I please get a review on this approach before I let it go live?


Solution

  • You can't do that. When you deal with text, encoding matters big time. So you must do conversion.

    And it is also bad to keep things in a char array, you should use a byte array. Because:

    • If not already defined in some header, you should define (or typedef) byte as unsigned char. Plain char can be signed or unsigned, and you will have surprises.
    • More readable, as it makes the intent clear. I see byte, it is a bunch of bytes. I see char, it is plain text (and in your case, it is obviously not the case)