How can I turn Turkish chars to ASCII? (like ş
to s
)
I tried replace but it didn't do anything. Here is my code:
$posta = $posta.ToLower()
$posta = $posta -replace "ü","u"
$posta = $posta -replace "ı","i"
$posta = $posta -replace "ö","o"
$posta = $posta -replace "ç","c"
$posta = $posta -replace "ş","s"
$posta = $posta -replace "ğ","g"
$posta = $posta.trim()
write-host $posta
if $posta
was eylül
it returns eylül
All credits to this answer combined with the comment in the same answer which shows the appropriate way to do it by filtering for characters which are not NonSpacingMark
followed by replacing ı
with i
. The answer is in c# hence sharing how it can be done in powershell.
Original answer uses Enumerable.Where
which in PowerShell would look like this:
$posta = 'üıöçşğ'
[string]::new([System.Linq.Enumerable]::Where(
[char[]] $posta.Normalize([Text.NormalizationForm]::FormD),
[Func[char, bool]]{ [char]::GetUnicodeCategory($args[0]) -ne [Globalization.UnicodeCategory]::NonSpacingMark })).
Replace('ı', 'i')
However Linq syntax is quite cumbersome in PowerShell as these are not extension methods we need to call the APIs directly. A relatively easier approach is to use .Where
intrinsic method:
$posta = 'üıöçşğ'
[string]::new($posta.Normalize([Text.NormalizationForm]::FormD).ToCharArray().
Where{ [char]::GetUnicodeCategory($_) -ne [Globalization.UnicodeCategory]::NonSpacingMark }).
Replace('ı', 'i')
A simplified regex approach using -replace
operator, thanks to mklement0 for the tip:
$posta = 'üıöçşğ'
$posta.Normalize('FormD') -replace '\p{M}' -creplace 'ı', 'i'
See Unicode category or Unicode block: \p{}
for details.