Search code examples
c++stringtwitterutf-8character-encoding

How to UTF-8 encode a character/string


I am using a Twitter API library to post a status to Twitter. Twitter requires that the post be UTF-8 encoded. The library contains a function that URL encodes a standard string, which works perfectly for all special characters such as !@#$%^&*() but is the incorrect encoding for accented characters (and other UTF-8).

For example, 'é' gets converted to '%E9' rather than '%C3%A9' (it pretty much only converts to a hexadecimal value). Is there a built-in function that could input something like 'é' and return something like '%C9%A9"?

edit: I am fairly new to UTF-8 in case what I am requesting makes no sense.

edit: if I have a

string foo = "bar é";

I would like to convert it to

"bar %C3%A9"

Thanks


Solution

  • If you have a wide character string, you can encode it in UTF8 with the standard wcstombs() function. If you have it in some other encoding (e.g. Latin-1) you will have to decode it to a wide string first.

    Edit: ... but wcstombs() depends on your locale settings, and it looks like you can't select a UTF8 locale on Windows. (You don't say what OS you're using.) WideCharToMultiByte() might be more useful on Windows, as you can specify the encoding in the call.