I have the following code on PHP:
$oldstring = 'oldword1, oldword2. Oldword1. Oldword2. OLDWORD1. OLDWORD2';
$results = array(array('old'=>'oldword1', 'new'=>'newword1'), array('old'=>'oldword2', 'new'=>'newword2'));
foreach ($results as $row) {
$fndrep[$row['old']] = $row['new'];
}
$pattern = '~(?=([A-Z]?)([a-z]?))\b(?i)(?:'
. implode('|', array_keys($fndrep))
. ')\b~';
$newstring = preg_replace_callback($pattern, function ($m) use ($fndrep) {
$lowm = $fndrep[strtolower($m[0])];
if ($m[1])
return ($m[2]) ? ucfirst($lowm) : strtoupper($lowm);
else
return $lowm;
}, $oldstring);
echo $newstring;
As you can see it replaced all the old words with new ones. At that results array must contain the words for replacing only in lowercase. It works perfectly for Latin characters if "oldword" is in lowercase (oldword1, oldword2) or with a capital letter (Oldword1, Oldword2) or in uppercase (OLDWORD1, OLDWORD2). But I need the same solution for Cyrillic.
If I change
$pattern = '~(?=([A-Z]?)([a-z]?))\b(?i)(?:'
to Unicode
$pattern = '~(?=([\x{0410}-\x{042F}]?)([\x{0430}-\x{044F}]?))\b(?i)(?:'
and
. ')\b~';
to
. ')\b~u';
it works for Cyrillic too but only if "oldword" is in lowercase (oldword1
, oldword2
) and doesn't work if the "oldword" is with a capital letter (Oldword1
, Oldword2
) or in uppercase (OLDWORD1
, OLDWORD2
)
Can anyone resolve the problem?
I've found the solution. It turns out for Cyrillic we need to use mb_strtolower/mb_strtoupper and some more code instead of ucfirst() function. And I'm surprised that no one noticed it
...
$newstring = preg_replace_callback($pattern, function ($m) use ($fndrep) {
mb_internal_encoding('UTF-8');
$lowm = $fndrep[mb_strtolower($m[0])];
if ($m[1])
return ($m[2]) ?
mb_strtoupper(mb_substr($lowm, 0, 1)) . mb_substr(mb_convert_case($lowm, MB_CASE_LOWER), 1, mb_strlen($lowm))
: mb_strtoupper($lowm);
else
return $lowm;
}, $oldstring);
...