powershell groovy encoding command-line locale

Groovy cyrillic characters output problem

I have a problem with output in groovy script. For example this code:

def rusAlphabet = 'АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ'
def lowerCaseRusAlphabet = 'абвгдеёжзийклмнопрстуфхцшщъыьэюя'

println(rusAlphabet)
println(rusAlphabet.toLowerCase())
println(lowerCaseRusAlphabet)

prints:

AБВГДЕ?ЖЗИЙКЛМ?ОПРСТУФХЦЧШЩЪЫЬЭЮЯ
a??
абвгдеёжзийклмнопр?туфхцшщъыь?ю?

It works fine with Python scripts. I work on Windows 10 x64.

In CMD and PowerShell cyrillic characters were displayed as questions. Then I checked "Beta: Use Unicode UTF-8 for worldwide language support" in region administrative settings. Now it works fine, characters are displayed normally. But not for groovy scripts.

Tried this code in script:

try {
    System.setOut(new PrintStream(new FileOutputStream(FileDescriptor.out), true, "UTF-8"));
} catch (UnsupportedEncodingException e) {
    throw new InternalError("VM does not support mandatory encoding UTF-8");
}

It prints:

AÐ‘Ð’Ð“Ð”Ð•Ð�Ð–Ð—Ð˜Ð™ÐšÐ›ÐœÐ�ÐžÐŸÐ Ð¡Ð¢Ð£Ð¤Ð¥Ð¦Ð§Ð¨Ð©ÐªÐ«Ð¬ÐÐ®Ð¯
að‘ð’ð“ð”ð•ð�ð–ð—ð˜ð™ðšð›ðœð�ðžðÿð ð¡ð¢ð£ð¤ð¥ð¦ð§ð¨ð©ðªð«ð¬ðð®ð¯
Ð°Ð±Ð²Ð³Ð´ÐµÑ‘Ð¶Ð·Ð¸Ð¹ÐºÐ»Ð¼Ð½Ð¾Ð¿Ñ€Ñ�Ñ‚ÑƒÑ„Ñ…Ñ†ÑˆÑ‰ÑŠÑ‹ÑŒÑ�ÑŽÑ�

Solution

I would have expected your activation of system-wide support for UTF-8 (Windows code page 65001) to solve your problem, because it sets both the OEM and the ANSI code page to 65001, which should make all legacy (non-Unicode) programs "speak UTF-8".
- Note that activating this feature - while convenient - has far-reaching consequences and can break legacy code: see this answer for background information.
- If you do not use this feature, the following is required in addition to ensuring that source code is read as UTF-8 (see next major point):
  - As shown in this answer mentioned in the comments, you must switch stdout and stderr (the standard output and standard error streams) to UTF-8:^[1]
```
System.setOut(new PrintStream(new FileOutputStream(FileDescriptor.out), true, "UTF-8"));
System.setErr(new PrintStream(new FileOutputStream(FileDescriptor.err), true, "UTF-8"));
```
  - You also need the execute the following to make a PowerShell session use UTF-8 consistently (see this answer for background information):
```
$OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = New-Object System.Text.UTF8Encoding
```
Your problem implies that Groovy doesn't interpret your source code file (script file) as UTF-8 but rather as Windows-1252, which is the ANSI code page for the US-English locale as well as many European ones.
- Groovy, perhaps needless to say, is based on Java, and Java versions 17 and below use the system's ANSI code page to interpret source code files, whereas v18+ commendably uses UTF-8. As such, with the ANSI code page being 65001, i.e. UTF-8, this shouldn't be a problem - but perhaps Java determines what the active ANSI code page is differently.
- However, irrespective of whether you've activated system-wide UTF-8 support, you can explicitly instruct Groovy / Java to interpret source code as UTF-8, as follows:
  - groovy `-Dfile.encoding=UTF8 <your-Groovy-script>
    - Note the ` before -, which is only necessary when calling from PowerShell, due to an unfortunate bug - see GitHub issue #6291.
  - Alternatively, you can preset this option via the JAVA_TOOL_OPTIONS environment variable (e.g., from PowerShell, for the current process:
    $env:JAVA_TOOL_OPTIONS = '-Dfile.encoding=UTF8'), though note that the Groovy CLI will then print a message indicating use of the environment variable.

^{[1] Note: I'm unclear on how to also switch stdin (the standard input stream) to UTF-8 for text-based operations; do tell us if you know.}