Search code examples
clinuxencodinglocale

Linux & C-Programming: How can I write utf-8 encoded text to a file?


I am interested in writing utf-8 encoded strings to a file.

I did this with low level functions open() and write(). In the first place I set the locale to a utf-8 aware character set with setlocale("LC_ALL", "de_DE.utf8"). But the resulting file does not contain utf-8 characters, only iso8859 encoded umlauts. What am I doing wrong?

Addendum: I don't know if my strings are really utf-8 encoded in the first place. I just keep them in the source file in this form: char *msg = "Rote Grütze";

See screenshot for content of the textfile: alt text http://img19.imageshack.us/img19/9791/picture1jh9.png


Solution

  • Changing the locale won't change the actual data written to the file using write(). You have to actually produce UTF-8 characters to write them to a file. For that purpose you can use libraries as ICU.

    Edit after your edit of the question: UTF-8 characters are only different from ISO-8859 in the "special" symbols (ümlauts, áccénts, etc.). So, for all the text that doesn't have any of this symbols, both are equivalent. However, if you include in your program strings with those symbols, you have to make sure your text editor treats the data as UTF-8. Sometimes you just have to tell it to.

    To sum up, the text you produce will be in UTF-8 if the strings within the source code are in UTF-8.

    Another edit: Just to be sure, you can convert your source code to UTF-8 using iconv:

    iconv -f latin1 -t utf8 file.c
    

    This will convert all your latin-1 strings to utf8, and when you print them they will be definitely in UTF-8. If iconv encounters a strange character, or you see the output strings with strange characters, then your strings were in UTF-8 already.

    Regards,