Search code examples
phpregexcharacterpreg-matchspecial-characters

PHP - preg_match() - matching substitution character black diamond with question mark


I have a problem with substitution character - diamond question mark � in text I'm reading with SplFileObject. This character is already present in my text file, so nothing can't be done to convert it to some other encoding. I decided to search for it with preg_match(), but the problem is that PHP can't find any occurence of it. PHP probably sees it as different character as �. I don't want to just remove this character from text, so that's the reason I want to search for it with preg_match(). Is there any way to match this character in PHP?

I tried with regex line: /.�./i, but without success.


Solution

  • PHP with SplFileObject seems to read the file a little bit different and instead of U+FFFD detects U+0093 and U+0094. If you are having the same problem as I had, then I suggest you to use hexdump to get information on how unrecognized character is encoded in it. Afterwards I suggest you to use this snippet as recommended by @stribizhev in comments, to get hex code recognized by PHP. Once you figure out what is correct hex code of unrecognized character (use conversion tool as suggested by @stribizhev in comments, to get correct value), you can use preg_...() function. Here's the solution to my problem:

    preg_replace("/(?|\x93|\x94)/i", "'", $text);