I have a weird thing going on in my powershell-script using svn commands. Following is an Powershell-Example Script:
$svnOutput = svn status
Write-Host "Output when saved in a variable"
$svnOutput
Write-Host "Direct Output"
svn status
If I run this script (within the powershell-console), I get two different outputs, if one of the files have non-ascii-characters in the name (in my examples umlauts like üöä). This is the output
Output when saved in a variable
? Test_���.txt
Direct Output
? Test_äöü.txt
I am working on a Windows Server 2022 with a VisualSVNServer Version 5.4.1. I already tested following ideas:
chcp 65001
$svnOutput = svn status
$svnOutput
$OutputEncoding = [System.Text.Encoding]::UTF8
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
$svnOutput = svn status
$svnOutput
$svnOutput = & svn status | Out-String -Stream
$svnOutput
svn status > svn_status.txt
$svnOutput = Get-Content -Path "svn_status.txt" -Encoding UTF8
$svnOutput
$svnOutput = & svn status | Out-String -Stream
$svnOutput
But all of them give the same error.
PS: this also happends with other commands like
$svnOutput = svn add . --force
$svnOutput
which results in:
A Test_���.txt
Typing svn add . --force
in a powershell instance, or even i a script works without any issues. Hopefully someone can help me here - thanks!
The SVN documentation states (emphasis added):
The default character encoding is derived from your operating system's native locale.
This is in the context of the --encoding
parameter, which is documented as overriding the default encoding on submitting information ("your commit message"), but it seemingly (and sensibly) also applies when retrieving information.
On Windows, the native locale is the so-called legacy system locale, aka language for non-Unicode programs, and it determines two encodings, via Windows code pages: the OEM code page (wich may be, e.g., CP437 or CP850) - typically used by console (terminal) applications - and the ANSI code page (e.g., Windows-1252 or Windows-1251) - typically used by GUI applications.
While the SVN documentation doesn't spell out which of these two code pages the svn
utility uses, per your own feedback it seems to be the system's active ANSI code page (which, as noted, is unusual, because console applications by convention use the OEM code page; python
is similarly unusual).
PowerShell consoles on Windows use the OEM code page by default, as reflected in [Console]::OutputEncoding
.
Thus, in order for PowerShell to interpret (decode) ANSI output correctly,[1] [Console]::OutputEncoding
must be (temporarily) set to the system's active ANSI code page, as follows:
& {
# Temporarily change the expected output encoding to the ANSI code page.
$prevEnc = [Console]::OutputEncoding
[Console]::OutputEncoding =
[Text.Encoding]::GetEncoding(
[int] (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP)
)
svn status
# Restore the original encoding.
Console]::OutputEncoding = $prevEnc
}
Note that (non-CJK) ANSI encodings are fixed single-byte encodings and therefore limited to 256 characters. If you need full Unicode support, use --encoding utf8
in your svn
call and set [Console]::OutputEncoding = [Text.UTF8Encoding]::new()
[2]
See also:
[1] Note that decoding, i.e. converting an external program's raw byte output into .NET strings (as used by PowerShell) based on a character encoding, only comes into play when an external program's output is either captured (in a variable), relayed, or redirected. When printing directly to the display, encoding problems usually do not surface, because many CLIs use the Unicode-capable WriteConsoleW
WinAPI function for that.
[2] Assuming you have administrative privileges, another option is use UTF-8 as part of your system locale, which sets both the OEM and the ANSI code page to 65001
, i.e. UTF-8. However, doing so has far-reaching consequences: see this answer.