I have the following piece of code which seems to be changing my character set.
$html = "à";
echo $html; // result: à
$html = preg_replace("/\s/", "", $html);
echo $html; // result: ?
However, when I use [\t\n\r\f\v]
as my pattern instead of the special character \s
it works fine:
$html = "à";
echo $html; // result: à
$html = preg_replace("/[\t\n\r\f\v]/", "", $html);
echo $html; // result: à
Why is that?
I have the same problem. It is because of UTF8.
à
is 0xc3a0
in UTF8. In PHP you can write like this: "\xc3\xa0"
.
With PCRE the /s
match 0xa0
like it was ASCII "Non-breaking space".
You can use the u
flag to resolve the problem.
$html = preg_replace("/\s/u", "", $html);