Search code examples
powershellcharacter-encodingascii

How can i turn Turkish chars to ascii?


How can I turn Turkish chars to ASCII? (like ş to s)

I tried replace but it didn't do anything. Here is my code:

$posta = $posta.ToLower()
$posta = $posta -replace "ü","u" 
$posta = $posta -replace "ı","i"
$posta = $posta -replace "ö","o"
$posta = $posta -replace "ç","c"
$posta = $posta -replace "ş","s"
$posta = $posta -replace "ğ","g"
$posta = $posta.trim()
write-host $posta

if $posta was eylül it returns eylül


Solution

  • All credits to this answer combined with the comment in the same answer which shows the appropriate way to do it by filtering for characters which are not NonSpacingMark followed by replacing ı with i. The answer is in hence sharing how it can be done in .

    Original answer uses Enumerable.Where which in PowerShell would look like this:

    $posta = 'üıöçşğ'
    [string]::new([System.Linq.Enumerable]::Where(
        [char[]] $posta.Normalize([Text.NormalizationForm]::FormD),
        [Func[char, bool]]{ [char]::GetUnicodeCategory($args[0]) -ne [Globalization.UnicodeCategory]::NonSpacingMark })).
        Replace('ı', 'i')
    

    However Linq syntax is quite cumbersome in PowerShell as these are not extension methods we need to call the APIs directly. A relatively easier approach is to use .Where intrinsic method:

    $posta = 'üıöçşğ'
    [string]::new($posta.Normalize([Text.NormalizationForm]::FormD).ToCharArray().
        Where{ [char]::GetUnicodeCategory($_) -ne [Globalization.UnicodeCategory]::NonSpacingMark }).
        Replace('ı', 'i')
    

    A simplified approach using -replace operator, thanks to mklement0 for the tip:

    $posta = 'üıöçşğ'
    $posta.Normalize('FormD') -replace '\p{M}' -creplace 'ı', 'i'
    

    See Unicode category or Unicode block: \p{} for details.