Search code examples
powershellencodingutf-8ascii

Powershell prevent UTF8 conversion


when I try to manipulate an .ini File with Powershell it always switch the encoding to UTF-8.

My Code:

Get-Content -Path "./update.ini" -Encoding ascii | Out-File -FilePath "ascii_update.ini" -Encoding ascii

The file needs to stay ASCII, so how can I disable this behaviour or how to switch it back to ASCII?


Solution

  • German characters will not be shown correctly

    Given that you don't want UTF-8 encoding yet you want German umlauts, what you're looking for is ANSI encoding, not ASCII.

    • In Windows PowerShell, ANSI encoding is the default encoding used by Get-Content and Set-Content (but not Out-File, which defaults to "Unicode" (UTF-16LE)), so all you need is the following:
    # Windows PowerShell: ANSI encoding is the default for Get-Content / Set-Content
    Get-Content ./update.ini | Set-Content ascii_update.ini
    
    • In PowerShell (Core) 7+, (BOM-less) UTF-8 is now the default, across all cmdlets, so you must request ANSI encoding explicitly, using -Encoding

      • Unfortunately, whereas Default refers to the system's active ANSI encoding in Windows PowerShell, in PowerShell (Core) it now refers to UTF-8, and there is no predefined ANSI enumeration value to complement the OEM value - this baffling omission is discussed in GitHub issue #6562.

      • Therefore, you must determine the active ANSI code page explicitly, as shown below.

    $ansiEnc = [cultureinfo]::CurrentCulture.TextInfo.ANSICodePage
    Get-Content -Encoding $ansiEnc ./update.ini |
      Set-Content -Encoding $ansiEnc ascii_update.ini
    

    notepad shows it as UTF-8 on the bottom right side.

    ASCII encoding is a subset of UTF-8 encoding, which is why most editors show pure ASCII files as UTF-8, because they are by definition also valid UTF-8 files.

    Note that if you save or read text that contains non-ASCII characters with -Encoding ASCII, the non-ASCII characters are "lossily" transcoded to verbatim ? characters.


    Optional reading: managing INI files as UTF-16LE ("Unicode") encoded, support via Windows API functions:

    zett42 points out that the WritePrivateProfileString and GetPrivateProfileString Windows API functions interpret INI files as follows:

    • If a file has a UTF-16LE ("Unicode") BOM, it is read and updated as such.

    • Otherwise, it is invariably interpreted as ANSI-encoded (even if it has a different Unicode encoding's BOM, such as UTF-8).

    If you let WritePrivateProfileString create an INI file implicitly, it is always created without a BOM, and therefore treated as ANSI-encoded (even if you use the Unicode version of the API function). If you try to write non-ANSI-range Unicode characters to such a file, they are quietly and lossily transcoded as follows: either to an ASCII-range equivalent, for accented letters, if applicable (e.g., ă is transoced to a); otherwise, to verbatim ?

    Thus, creating the INI file of interest explicitly with a UTF-16lE BOM is necessary in order to maintain the file as UTF-16LE-encoded and therefore enable full Unicode support.

    Thus, you could create the INI file initially with a command such as Set-Content -Encoding Unicode ./update.ini -Value @(), which creates an (otherwise) empty file that contains only a UTF-16LE BOM, and then stick with -Encoding Unicode if you need to manipulate the file directly.

    This MIT-licensed Gist (authored by me) contains module file IniFileHelper.psm1, whose Get-IniValue and Set-IniValue functions wrap the above-mentioned Windows API functions, with the crucial difference that when Set-IniValue implicitly creates an INI file it uses UTF-16LE encoding.

    The following, self-contained example demonstrates this:

    # Download the module code and import it via a session-scoped, dynamic module.
    # IMPORTANT: 
    #   While I can personally assure you that doing this is safe,
    #   you should always check the source code yourself first.
    $null = New-Module -Verbose -ScriptBlock ([scriptblock]::Create((Invoke-RestMethod 'https://gist.githubusercontent.com/mklement0/006c2352ddae7bb05693be028240f5b6/raw/1e2520810213f76f2e8f419d0e48892a4009de6a/IniFileHelper.psm1')))
    
    # Implicitly create file "test.ini" in the current directory,
    # and write key "TestKey" to section "Main", with a value
    # that contains an ASCII-range character, an ANSI-range character,
    # and a character beyond either of these two ranges.
    Set-IniValue test.ini Main TestKey 'haäă'
    
    # Now retrieve the same entry, which should show the exact
    # same value, 'haäă'
    # Note: If there is a preexisting "test.ini" file that does NOT
    #       have a UTF-16LE BOM, the non-ANSI 'ă' character would be
    #       "best-fit" transcoded to ASCII 'a'.
    #       Other non-ANSI characters that do not have ASCII-range analogs
    #       would be lossily transcoded to verbatim '?'
    Get-IniValue test.ini Main TestKey