Casting int to chars in Powershell has unexpected results

I am trying to generate strings with 1 of every ASCII character. I started with

32..255| %{[char]$_ | Out-File -filepath .\outfile.txt -Encoding ASCII -Append}

I expected the list of printable characters, but I got different characters.

Can anyone point me to either a better way to get my expected result or an explanation as to why I'm getting these results?

Solution

[char[]] (32..255) | Set-Content outfile.txt

In Windows PowerShell this will create an "ANSI"-encoded file. The term "ANSI" encoding is an umbrella term for the set of fixed-width, single-byte, 8-bit encodings on Windows that are a superset of ASCII encoding. The specific "ANSI" encoding that is used is implied by the code page associated with the legacy system locale in effect on your system^[1]; e.g., Windows-1252 on US-English systems.

^{See the bottom section for why "ANSI" encoding should be avoided.}

If you were to do the same thing in PowerShell (Core) 7+, you'd get a UTF-8-encoded file without a BOM, which is the best encoding to use for cross-platform and cross-locale compatibility.

In Windows PowerShell, adding -Encoding utf8 would give you an UTF-8 file too, but with a BOM.^[2]

If you used -Encoding Unicode or simply used redirection operator > or Out-File, you'd get a UTF-16LE-encoded file.
(In PowerShell (Core), by contrast, > produces BOM-less UTF-8 by default, because the latter is the consistently applied default encoding).

^{Note: With strings and numbers, Set-Content and > / Out-File can be used interchangeably (encoding differences in Windows PowerShell aside); for other types, only > / Out-File produces meaningful representations, albeit suitable only for human eyeballs, not programmatic processing - see this answer for more.}

ASCII code points are limited to 7-bit values, i.e., the range 0x0 - 0x7f (127).

Therefore, your input values 128 - 255 cannot be represented as ASCII characters, and using -Encoding ASCII results in invalid input characters getting replaced with literal ? characters (code point 0x3f / 63), resulting in loss of information.

Important:

In memory, casting numbers such as 32 (0x20) or 255 (0xFF) to [char] (System.Char) instances causes the numbers to be interpreted as UTF-16 code units, representing Unicode characters^[3] such as U+0020 and U+00FF as 2-byte sequences using the native byte order, because that's what characters are in .NET.
Similarly, instances of the .NET [string] type System.String are sequences of one or more [char] instances.

On output to a file or during serialization, re-encoding of these UTF-16 strings may occur, depending on the implied or specified output encoding.

If the output encoding is a fixed single-byte encoding, such as ASCII, Default ("ANSI"), or OEM, loss of information may occur, namely if the string to output contains characters that cannot be represented in the target encoding.
Choose one of the Unicode-based encoding formats to guarantee that:
- no information is lost,
- the resulting file is interpreted the same on all systems, irrespective of their system locale.
- UTF-8 is the most widely recognized encoding, but note that Windows PowerShell (unlike PowerShell Core) invariably prepends a BOM to such files, which can cause problems on Unix-like platforms and with utilities of Unix heritage; it is a format focused on and optimized for backward compatibility with ASCII encoding that uses between 1 - 4 bytes to encode a single character.
- UTF-16LE (which PowerShell calls Unicode) is a direct representation of the in-memory code units, but note that each characters is encoded with (at least) 2 bytes, which results in up to twice the size of UTF-8 files for strings that primarily contain characters in the ASCII range.
- UTF-16BE (which PowerShell calls bigendianunicode) reverses the byte order in each code unit.
- UTF-32LE (which PowerShell calls UTF32), represents each Unicode character as a fixed 4-byte sequence; even more so than with UTF-16, this typically results in unnecessarily large files.
- UTF-7 should be avoided altogether, as it is not part of the Unicode standard.

^{[1] Among the legacy code pages supported on Windows, there are also fixed double-byte as well as variable-width encodings, but only for East Asian locales; sometimes they're (incorrectly) collectively referred to as DBCS (Double-Byte Character Set), as opposed to SBCS (Single-Byte Character Set); see the list of all Windows code pages.}

^{[2] See this answer for how to create BOM-less UTF-8 files in Windows PowerShell.}

^{[3] Strictly speaking, a UTF-16 code unit identifies a Unicode code point, but not every code point by itself is a complete Unicode character, because some (rare) Unicode characters have a code point value that falls outside the range that can be represented with a 16-bit integer, and these code points can alternatively represented by a sequence of 2 other code points, known as surrogate pairs.}