Search code examples
c++utf-8character-encodingcharacteransi

How to convert ANSI to UTF-8?


How to convert ANSI to UTF-8 ?
I use Linux socket to recv HTTP response, but there are some UTF-8 characters in the response, if I print them, I will find some error characters, like this:

[ghostworker@ArchForXed b-client]$ ./get-http-response
HTTP/1.1 200 OK
Date: Tue, 14 Jul 2020 03:24:11 GMT
Content-Type: application/json; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Server: Tengine
S-TIME-NS: 
S-TIME-CONN: 
S-TIME-PRE: 
S-TIME-START: 
S-TIME-TOTAL: 
ETag: "0215910f600c2a23e08f40035c3f881e"
Content-Encoding: gzip
Vary: Accept-Encoding
X-Cache-Webcdn: BYPASS from ks-sh-webcdn-25

b0
�
[ghostworker@ArchForXed b-client]$ 

I know that ANSI code cannot display UTF-8 code, how could I convert ANSI to UTF-8?


Solution

  • I know that ANSI code cannot display UTF-8 code, how could I convert ANSI to UTF-8?

    There is no such encoding as "ANSI". If you mean ASCII (aka ANSI_X3.4-1968), then there is no need to do anything because ASCII is also valid UTF-8 as is.

    If the content is already in UTF-8 (as the charset header implies), then converting to UTF-8 from another encoding makes no sense.

    I use Linux

    If you meant that you want to convert from UTF-8 to ASCII, then I would like to point out that it is quite likely that your terminal (emulator) is configured to use UTF-8 in which case such conversion would be counter-productive. Also, note that if the content has characters that don't exist in the target encoding, then they cannot be shown.

    If it is actually true that you need to convert between UTF-8, and some other encoding (and that conversion if not from ASCII to UTF-8), then you'll find that C++ has no standard way to perform such conversion. You can either read the specifications for the respective encodings and implement the conversion yourself which is non-trivial and probably not something that would fit in a stack overflow answer, or (as is nearly always the better option) you can save time by using an implementation written by someone else.


    What you really probably need to do first is pay attention to this header:

    Content-Encoding: gzip

    And conclude that the response is not text, but instead binary result of a compression algorithm, and you need to decompress it to make it readable. There are no standard (de-)compression functions in C++ either.