I have a problem with output in groovy script. For example this code:
def rusAlphabet = 'АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ'
def lowerCaseRusAlphabet = 'абвгдеёжзийклмнопрстуфхцшщъыьэюя'
println(rusAlphabet)
println(rusAlphabet.toLowerCase())
println(lowerCaseRusAlphabet)
prints:
AБВГДЕ?ЖЗИЙКЛМ?ОПРСТУФХЦЧШЩЪЫЬЭЮЯ
a??
абвгдеёжзийклмнопр?туфхцшщъыь?ю?
It works fine with Python scripts. I work on Windows 10 x64.
In CMD and PowerShell cyrillic characters were displayed as questions. Then I checked "Beta: Use Unicode UTF-8 for worldwide language support" in region administrative settings. Now it works fine, characters are displayed normally. But not for groovy scripts.
Tried this code in script:
try {
System.setOut(new PrintStream(new FileOutputStream(FileDescriptor.out), true, "UTF-8"));
} catch (UnsupportedEncodingException e) {
throw new InternalError("VM does not support mandatory encoding UTF-8");
}
It prints:
AБВГДЕÐ�ЖЗИЙКЛМÐ�ОПРСТУФХЦЧШЩЪЫЬÐЮЯ
að‘ð’ð“ð”ð•ð�ð–ð—ð˜ð™ðšð›ðœð�ðžðÿð ð¡ð¢ð£ð¤ð¥ð¦ð§ð¨ð©ðªð«ð¬ðð®ð¯
абвгдеёжзийклмнопр�туфхцшщъыь�ю�
I would have expected your activation of system-wide support for UTF-8 (Windows code page 65001
) to solve your problem, because it sets both the OEM and the ANSI code page to 65001
, which should make all legacy (non-Unicode) programs "speak UTF-8".
Note that activating this feature - while convenient - has far-reaching consequences and can break legacy code: see this answer for background information.
If you do not use this feature, the following is required in addition to ensuring that source code is read as UTF-8 (see next major point):
As shown in this answer mentioned in the comments, you must switch stdout and stderr (the standard output and standard error streams) to UTF-8:[1]
System.setOut(new PrintStream(new FileOutputStream(FileDescriptor.out), true, "UTF-8"));
System.setErr(new PrintStream(new FileOutputStream(FileDescriptor.err), true, "UTF-8"));
You also need the execute the following to make a PowerShell session use UTF-8 consistently (see this answer for background information):
$OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = New-Object System.Text.UTF8Encoding
Your problem implies that Groovy doesn't interpret your source code file (script file) as UTF-8 but rather as Windows-1252, which is the ANSI code page for the US-English locale as well as many European ones.
Groovy, perhaps needless to say, is based on Java, and Java versions 17 and below use the system's ANSI code page to interpret source code files, whereas v18+ commendably uses UTF-8. As such, with the ANSI code page being 65001
, i.e. UTF-8, this shouldn't be a problem - but perhaps Java determines what the active ANSI code page is differently.
However, irrespective of whether you've activated system-wide UTF-8 support, you can explicitly instruct Groovy / Java to interpret source code as UTF-8, as follows:
groovy `-Dfile.encoding=UTF8 <your-Groovy-script>
`
before -
, which is only necessary when calling from PowerShell, due to an unfortunate bug - see GitHub issue #6291.Alternatively, you can preset this option via the JAVA_TOOL_OPTIONS
environment variable (e.g., from PowerShell, for the current process:
$env:JAVA_TOOL_OPTIONS = '-Dfile.encoding=UTF8'
), though note that the Groovy CLI will then print a message indicating use of the environment variable.
[1] Note: I'm unclear on how to also switch stdin (the standard input stream) to UTF-8 for text-based operations; do tell us if you know.