How can I retrieve users with non-latin names from this output?
.\Pacli.exe USERSLIST INCLUDESUBLOCATIONS=YES output`(name`, enclose`, type`) > users.txt
This saves and recalls the non-latin characters as ?
or �
even with Get-Content -Encoding UTF8.
I tried to set
$OutputEncoding = [Console]::OutputEncoding = New-Object System.Text.UTF8Encoding
before this command but got the same result.
tl;dr
Use the code at the bottom to temporarily change [Console]::OutputEncoding
to match PACLI.exe
's nonstandard output encoding, which appears to be ANSI, to ensure that its output is decoded correctly.
Per your own feedback, it turns out that PACLI.exe
exhibits nonstandard behavior and outputs Windows-1252-encoded text.
Note that the specific code page used on a given system may be driven more abstractly by the legacy ANSI code page associated with that system's legacy system locale (aka language for non-Unicode programs). This is the - also nonstandard - behavior that Python exhibits, for instance.
1252
(Windows-1252), but on a Russian machine it would be 1251
(Windows-1251).The solution below assumes that PACLI.exe
too exhibits this ANSI-code-page-dependent behavior, so it uses the following to retrieve the current machine's ANSI code page, whatever it may be; if you know that PACLI.exe
hard-codes use of 1252
, specifically, replace the expression with verbatim 1252
:
[int] (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP)
It is a general requirement that the character encoding stored in [Console]::OutputEncoding
must match the actual encoding used in the output from an external program, because PowerShell uses the former to decode the latter.
Note:
This applies whenever PowerShell captures the output, such as in a variable or as part of an expression, or by relaying it to another command via the pipeline.
There is a notable exception with respect to >
, the redirection operator in PowerShell (Core) 7.4+: when applied to an external program, the raw output bytes are passed through to the target file. To instead capture the output and save it with a different encoding, use the pipeline and Set-Content
.
The encoding used for sending data to an external program via the pipeline is stored in the $OutputEncoding
preference variable, which - unfortunately - defaults to ASCII(!) in _Windows PowerShell, and to UTF-8 in PowerShell (Core) 7 (which is preferable to ASCII, but inconsistent with the [Console]::OutputEncoding
value - see GitHub issue #7233).
Standard behavior of console applications would be to respect the current console's output code page, as reflected in the encoding stored in [Console]::OutputEncoding
, in which case no extra effort is needed to properly decode and capture output.
CP437
on US-English systems.It is the limitations of the single-byte[1] OEM code pages - which limits what you can output to 256 characters that increasingly cause modern CLIs to output UTF-8, as it is capable of encoding all Unicode characters. node.exe
, the Node.Js CLI, is one example. Others allow UTF-8 opt-in, via command-line options or environment variables.
The PACLI.exe
and Python behavior of choosing the ANSI code page for their nonstandard encoding is unfortunate, because ANSI code pages are single-byte[1] too, and therefore don't solve the problem of limited character repertoire.
There is a system-wide solution that makes most programs behave properly without extra effort; however - it has far-reaching consequences and can change the behavior of existing scripts in undesired ways.
Assuming you have administrative privileges, you can set the legacy system locale to UTF-8, which sets both the OEM and the ANSI code page to 65001
, the UTF-8 code page. For details and a discussion of the far-reaching consequences, see this answer.
Note that this solution won't help with Windows CLIs such as sfc.exe
and wsl.exe
, which (situationally) output UTF-16LE; unless such CLIs offer UTF-8 opt-in (e.g. WSL's $env:WSL_UTF8=1
), you still need to temporarily modify [Console]::OutputEncoding
, as shown below.
Otherwise, you'll need to temporarily change [Console]::OutputEncoding
to match a nonstandard CLI's output encoding, as shown below. That is, save the current value of [Console]::OutputEncoding
before changing it, and restore it afterwards, to avoid affecting subsequent calls to (standard) external (console) applications (by default, because changing [Console]::OutputEncoding
affects the console window, it stays in effect for the remainder of the session).
Capturing output from a CLI that outputs ANSI, like (presumably) PACLI.exe
and Python, using your specific PACLI.exe
call:
& {
# Temporarily change the expected output encoding to ANSI.
$prevEnc = [Console]::OutputEncoding
[Console]::OutputEncoding =
[Text.Encoding]::GetEncoding(
[int] (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP)
)
try {
.\Pacli.exe USERSLIST INCLUDESUBLOCATIONS=YES output`(name`, enclose`, type`) |
Set-Content -Encoding utf8 users.txt
} finally {
# Restore the original encoding.
[Console]::OutputEncoding = $prevEnc
}
}
For UTF-8, use [Console]::OutputEncoding = [Text.UTF8Encoding]::new()
Note that > users.txt
was deliberately replaced with | Set-Content -Encoding utf8 users.txt
, to predictably generate a UTF-8 output file, in both PowerShell editions - although in Windows PowerShell the file will have a BOM.[2]
That is, the use of the pipeline with a file-saving command ensures that decoding into .NET strings of the external-program output takes place first, with the file-saving command then using its default encoding or the encoding specified via -Encoding
. This in effect allows you to transcode the output; in the case at hand, ANSI output turns into UTF-8 output.
In Windows PowerShell and PowerShell 7 up to 7.3.x, the >
operator too exhibits this behavior, where it is in effect an alias of piping to Out-File
using the latter's default encoding, which is UTF-16LE ("Unicode") in Windows PowerShell, and (BOM-less) UTF-8 in PowerShell 7.3-
As noted above, with respect to external programs >
in PowerShell 7.4+ now behaves differently, and captures the raw byte output in the target file; that is, with > users.txt
the above would create an ANSI file.
[1] Except in CJK system locales.
[2] Unfortunately, workarounds are required to create BOM-less UTF-8 files in Windows PowerShell (which PowerShell 7 creates by default) - see this answer.