Search code examples
php.netjsonguzzle

cannot convert JSON response from windows-1253 to utf8


I'm trying to parse a JSON response from a web service I have no control over.

These are the headers

enter image description here

This is the body I see in php with sensitive parts hidden

enter image description here

I'm using guzzle http client to send the request and to retrieve the response

If I try to decode it directly I receive an empty object so I'm assuming a conversion is needed so I am trying to convert the response contents like this

json_decode(iconv($charset, 'UTF-8', $contents))

or

mb_convert_encoding($contents, 'UTF-8', $charset);

both of which throw an exception.

Notice: iconv(): Wrong charset, conversion from 'windows-1253' to 'UTF-8' is not allowed in Client.php on line 205

Warning: mb_convert_encoding(): Illegal character encoding specified in Client.php on line 208

I've used this piece of code successfully before but I can't understand why it fails now.

Sending the same request using POSTMAN correctly retrieves the data without broken characters and it seems to show the same headers and body received.

I'm updating based on comments.

mb_detect_encoding($response->getBody()) -> UTF-8

mb_detect_encoding($response->getBody->getContents()) -> ASCII

json_last_error_msg -> Malformed UTF-8 characters, possibly incorrectly encoded

Additionally as a trial and error attempt I tried all iconv encodings to see if any could convert it to utf-8 without an error to detect the encoding using this one

        private function detectEncoding($str){
        $iconvEncodings = [...]
        $finalEncoding = "unknown";
        foreach($iconvEncodings as $encoding){
            try{
                iconv($encoding, 'UTF-8', $str);
                return $encoding;
            }
            catch (\Exception $exception){
                continue;
            }
        }
        return $finalEncoding;
    }

Apparently no encoding worked and everything gave the same exception. I'm assuming the problem is with retrieving the response json correctly via guzzle and not with iconv itself. It can't be that it's not any of the 1000+ ones.

Some more info with CURL

I just retried the same payload using CURL

  /**
     * @param $options
     * @return bool|string
     */
    public function makeCurlRequest($options)
    {

        $payload = json_encode($options);
        // Prepare new cURL resource
        $ch = curl_init($this->softoneurl);

        curl_setopt_array($ch, [
            CURLOPT_RETURNTRANSFER => true,   // return web page
            CURLOPT_HEADER => false,  // don't return headers
            CURLOPT_FOLLOWLOCATION => true,   // follow redirects
            CURLOPT_MAXREDIRS => 10,     // stop after 10 redirects
            CURLOPT_ENCODING => "",     // handle compressed
            CURLOPT_USERAGENT => "test", // name of client
            CURLOPT_AUTOREFERER => true,   // set referrer on redirect
            CURLOPT_CONNECTTIMEOUT => 120,    // time-out on connect
            CURLOPT_TIMEOUT => 120,    // time-out on response
            CURLINFO_HEADER_OUT => true,
            CURLOPT_POST => true,
            CURLOPT_POSTFIELDS => $payload,
        ]);

        // Set HTTP Header for POST request
        curl_setopt($ch, CURLOPT_HTTPHEADER, array(
                'Content-Type: application/json',
                'Content-Length: ' . strlen($payload))
        );

        // Submit the POST request
        $result = curl_exec($ch);

        // Close cURL session handle
        curl_close($ch);
        return $result;
    }

I received the exact same string and the exact same results with converting it. Perhaps an option I'm missing?

Apparently there's something wrong with iconv itself in the environment and it's not application specific. Running the following code via SSH

php -r "var_dump(iconv('Windows-1253', 'UTF-8', 'test'));"

yields

PHP Notice:  iconv(): Wrong charset, conversion from `Windows-1253' to `UTF-8' is not allowed in Command line code on line 1
PHP Stack trace:
PHP   1. {main}() Command line code:0
PHP   2. iconv(*uninitialized*, *uninitialized*, *uninitialized*) Command line code:1
Command line code:1:
bool(false)

Perhaps some dependency is missing


Solution

  • About 14 hours of troubleshooting later I'm able to answer my own question correctly. In my case since this was running in the context of a CLI command, it caused an issue due to missing libraries. Basically the CLI php binary didn't have access to some libraries iconv needed.

    More specifically the gconv libraries. In my case in Debian 9 it was located in

    /usr/lib/x86_64-linux-gnu/gconv

    and this folder contains a lot of libraries for each encoding used. A good way to understand this is if you run in a system you have root access the command

    strace iconv -f <needed_encoding> -t utf-8

    It will yield a lot of folders that iconv tries to access including the gconv folder and will point you to the location of the ones you need to include in your SSH environment. If you don't have access as root you have to ask your hosting provider.