Search code examples
powershellunicodeasciispecial-charactersdiacritics

Converting Unicode string to ASCII


I have strings containing characters which are not found in ASCII; such as á, é, í, ó, ú; and I need a function to convert them into something acceptable such as a, e, i, o, u. This is because I will be creating IIS web sites from those strings (i.e. I will be using them as domain names).


Solution

  • function Convert-DiacriticCharacters {
        param(
            [string]$inputString
        )
        [string]$formD = $inputString.Normalize(
                [System.text.NormalizationForm]::FormD
        )
        $stringBuilder = new-object System.Text.StringBuilder
        for ($i = 0; $i -lt $formD.Length; $i++){
            $unicodeCategory = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($formD[$i])
            $nonSPacingMark = [System.Globalization.UnicodeCategory]::NonSpacingMark
            if($unicodeCategory -ne $nonSPacingMark){
                $stringBuilder.Append($formD[$i]) | out-null
            }
        }
        $stringBuilder.ToString().Normalize([System.text.NormalizationForm]::FormC)
    }
    

    The resulting function will convert diacritics in the follwoing way:

    PS C:\> Convert-DiacriticCharacters "Ångström"
    Angstrom
    PS C:\> Convert-DiacriticCharacters "Ó señor"
    O senor
    

    Copied from: http://cosmoskey.blogspot.nl/2009/09/powershell-function-convert.html