Search code examples
phpcharacter-encodingescapingspecial-characters

PHP: How to encode U+FFFD in order to do a replace?


I'm trying to display a data feed on a page. We're experiencing encoding issues with a weird character. For some reason, in the feed there's the U+FFFD character. And htmlentities() will not escape the character, so I need to replace it manually. (I'm using PHP 5.3)

I've tried the following:

$string = str_replace( "\xFFFD",  "_", $string );
$string = str_replace( "\XFFFD",  "_", $string );
$string = str_replace( "\uFFFD",  "_", $string );
$string = str_replace("\x{FFFD}", "_", $string );
$string = str_replace("\X{FFFD}", "_", $string );
$string = str_replace("\P{FFFD}", "_", $string );
$string = str_replace("\p{FFFD}", "_", $string );

None of the above work.

After reading this page - http://php.net/manual/en/regexp.reference.unicode.php - I'm not sure what I'm doing wrong. Do I need to compile UTF-8 support into PCRE?


Solution

  • Use preg_replace instead like this:

    $string = preg_replace('@\x{FFFD}@u', '_', $string);