Search code examples
phpfacebookunicodeencoding

PHP escape unicode characters only


in Facebook validation documentation

Please note that we generate the signature using an escaped unicode version of the payload, with lowercase hex digits. If you just calculate against the decoded bytes, you will end up with a different signature. For example, the string äöå should be escaped to \u00e4\u00f6\u00e5.

I'm trying to make a unittest for the validation that I have, but I don't seem to be able to produce the signutre because I can't escape the payload. I've tried

mb_convert_encoding($payload, 'unicode')

But this encodes all the payload, and not just the needed string, as Facebook does.

My full code:

// on the unittest
$content = file_get_contents(__DIR__.'/../Responses/whatsapp_webhook.json');
        // trim whitespace at the end of the file
        $content = trim($content);

        $secret = config('externals.meta.config.app_secret');

        $signature = hash_hmac(
            'sha256',
            mb_convert_encoding($content, 'unicode'),
            $secret
        );

$response = $this->postJson(
            route('whatsapp.webhook.message'),
            json_decode($content, true),
            [
                'CONTENT_TYPE' => 'text/plain',
                'X-Hub-Signature-256' => $signature,
            ]
        );
$response->assertOk();
// on the request validation
/**
         * @var string $signature
         */
        $signature = $request->header('X-Hub-Signature-256');

        if (!$signature) {
            abort(Response::HTTP_FORBIDDEN);
        }

        $signature = Str::after($signature, '=');
        $secret = config('externals.meta.config.app_secret');

        /**
         * @var string $content
         */
        $content = $request->getContent();

        $payloadSignature = hash_hmac(
            'sha256',
            $content,
            $secret
        );

        if ($payloadSignature !== $signature) {
            abort(Response::HTTP_FORBIDDEN);
        }

Solution

  • For one, mb_convert_encoding($payload, 'unicode') converts the input to UTF-16BE, not UTF-8. You would want mb_convert_encoding($payload, 'UTF-8').

    For two, using mb_convert_encoding() without specifying the source encoding causes the function to assume that the input is using the system's default encoding, which is frequently incorrect and will cause your data to be mangled. You would want mb_convert_encoding($payload, 'UTF-8', $source_encoding). [Also, you cannot reliably detect string encoding, you need to know what it is.]

    For three, mb_convert_encoding() is entirely the wrong function to use to apply the desired escape sequences to the data. [and good lord are the google results for "php escape UTF-8" awful]

    Unfortunately, PHP doesn't have a UTF-8 escape function that isn't baked into another function, but it's not terribly difficult to write in userland.

    function utf8_escape($input) {
        $output = '';
        for( $i=0,$l=mb_strlen($input); $i<$l; ++$i ) {
            $cur = mb_substr($input, $i, 1);
            if( strlen($cur) === 1 ) {
                $output .= $cur;
            } else {
                $output .= sprintf('\\u%04x', mb_ord($cur));
            }
        }
        return $output;
    }
    
    $in = "asdf äöå";
    
    var_dump(
        utf8_escape($in),
    );
    

    Output:

    string(23) "asdf \u00e4\u00f6\u00e5"