Search code examples
character-encodingphp-extension

charachter encoding in PHP Extension


I'm currently writing a PHP extension in C++ with the Zend API. Basically I make PHP_METHOD{..} wrappers around my native C++ interface methods and using "zend_parse_parameters(..)" to fetch the corresponding input arguments.

This extension contains methods which can take strings as arguments, such as a filename.

I know from http://php.net/manual/en/language.types.string.php#language.types.string.details that strings have no encoding in PHP, but still can I expect from the PHP programmer that he will use a function like "utf8_decode(..)" such that the input strings can be read by the extension correctly?

Or does the PHP Programmer expect that the extension detects the encoding from the php-script and handles strings accordingly?

Every help is highly appreciated! Thanks!


Solution

  • You are correct. Strings are just binary blobs in PHP. As the author of an extension. Your options:

    • Have the user hand your extension UTF-8: By far the best option. The user has to make the decision. Assert that the string is UTF-8 encodable and fail early.
    • Encode yourself: You cannot know the meaning of the string. As PHP strings are just binary blobs and have no encoding information you do not know what the intended string content is. It might as well just come from a Windows file with weird encoding and was concatenated with a complete different encoding. Worse, it might be UTF-8 encodable, but actually not UTF-8, in which way you interpret it wrongly, without the user knowing. Hence, solution 1, have the user pass UTF-8.
    • Alternative: Force the user to pass an input encoding.

    Here is an example of the alterantive 3:

    $obj = MyExtensionClass('UTF-8'); // force encoding
    $obj->someMethod($inputStr); // try to convert now
    

    The standard library uses approach 1. See json_encode as an example: