I'm trying to match the sentence "ça vous dit quoi" with regex pattern:
$pattern=(\b".$value."\b)
The word boundaries work with anything except the French exclusive characters like the ç at the beginning of ça. I can solve the word boundary problem by changing the PHP locale thus:
setlocale(LC_ALL, 'fr_FR');
When I do this, it successfully matches the sentence, but all the French characters are then displayed as � so I get:
�a vous dit quoi
Kind of annoying. Solve one problem only to create another. I already have the html locale set to:
<html xmlns="http://www.w3.org/1999/xhtml" lang="fr" xml:lang="fr" version="XHTML+RDFa 1.0" dir="ltr">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Not sure what else needs to be done to fix this? Surely French should display ok with all the locales set to French...?
EDIT:My server is showing UTF-8 as the default character set for both the local and master value through phpinfo.
EDIT:This question is not similar to the one suggested because the question is not similar at all. The solution may be the same but anyone searching in google for the kind of problem I had would not find that question, but they would find mine. I think people are starting to just mark questions as duplicates just for the sake of it.
This question is also similar to mine in the same way, since the answer is the same: regular expression for French characters But that would make all THREE questions duplicates.
It seems like its a nightmare to fix the ?? display in the French locale, but I was able to fix this problem another way by modifying the regex pattern instead. By adding 'u' as a modifier in the patter it was able to detect the French character ç in ça and all works properly with no need to change the locale.
From this:
$pattern=(\b".$value."\b)
to this:
$pattern=(\b".$value."\b/u)