I need to trim a string of all characters except letters from any languages in UTF-8. For an early test this was working fine until obviously I started using UTF-8 non-Latin letters:
<?php
$s = '\$5ı龢abc';
echo '<p>'.$s.'</p>';
while (!preg_match('/([\p{L}]+)/u', $s[0]))
{
$s = substr($s, 1);
echo '<p>'.$s.'</p>';
}
?>
This currently outputs the following:
$5ı龢abc
$5ı龢abc
5ı龢abc
ı龢abc
�龢abc
龢abc
��abc
�abc
abc
I would like the final output to be: ı龢abc
. I'm not quite sure what I'm missing however?
Using individual character indexing doesn't work, since PHP isn't aware of "characters" in strings, and merely indexes bytes. This is obviously a problem with multi-byte characters. But you're doing it way too manually anyway; just replace all non-letter characters at the beginning of the string:
$s = preg_replace('/^\P{L}*/u', '', $s);