I have an external source repository that apparently stores some source files using ISO-8859-1 character encoding. I am having trouble getting javac to change from default UTF-8 to ISO-8859-1 when invoked through Bazel.
I'm fetching the external repository via Bazel and can determine the charset of the fetched files:
> cd bazel-PROJECT/external/third-party/src
> file -i LibraryCode.java
LibraryCode.java: text/x-c; charset=iso-8859-1
Building the external sources via Bazel's java_library, or attempting to compile external repository source files via javac directly from the command line fails with (expected):
error: unmappable character for encoding UTF8
Attempting to use javac's -encoding argument solves the compilation issue when used from the command line against the external repository files fetched by Bazel:
> javac -encoding iso-8859-1 LibraryCode.java
However, I've been unable to successfully pass the -encoding option to javac via Bazel.
I've tried so far:
None of these attempts got around the charset mismatch and compiler error.
1) repository_rule build_file: thirdparty.BUILD
java_library(
name = "thirdparty",
srcs = glob(["src/**/*.java"]),
javacopts = ["-encoding iso-8859-1"],
visibility = ["//visibility:public"]
)
2) Bazel command line:
> bazel build --javacopt="-encoding iso-8859-1" target
3) Defining Java toolchain target with encoding setting:
java_toolchain(
name = "toolchain",
bootclasspath = ["@bazel_tools//tools/jdk:bootclasspath"],
encoding = "iso-8859-1",
extclasspath = ["@bazel_tools//tools/jdk:extdir"],
forcibly_disable_header_compilation = 0,
genclass = ["@bazel_tools//tools/jdk:GenClass_deploy.jar"],
header_compiler = ["@bazel_tools//tools/jdk:turbine_deploy.jar"],
ijar = ["@bazel_tools//tools/jdk:ijar"],
javabuilder = ["@bazel_tools//tools/jdk:JavaBuilder_deploy.jar"],
javac = ["@bazel_tools//third_party/java/jdk/langtools:javac_jar"],
javac_supports_workers = 1,
jvm_opts = [
"-XX:+TieredCompilation",
"-XX:TieredStopAtLevel=1",
],
misc = [
"-XDskipDuplicateBridges=true",
],
singlejar = ["@bazel_tools//tools/jdk:SingleJar_deploy.jar"],
source_version = "8",
target_version = "8",
visibility = ["//visibility:public"]
)
All end up with error: unmappable character for encoding UTF8.
What is the mistake I am making in setting the javac encoding via Bazel?
I can try to work around the issue by converting the external repository source files via iconv but I'd prefer solving it via javac's encoding setting as intended.
The java_toolchain encoding not getting recognized appears to be a bug. I've a preliminary fix for this on my local copy of Bazel -- java_toolchain approach to changing charset (option #3 above) appears to work.
Tracking this issue and a proposed fix in: #2926
Unfortunately there is no good way to do that from the command line / target based. You have to write a java_toolchain and point to it. Deriving the one from bazel that would result in:
java_toolchain(
name = "toolchain",
bootclasspath = ["@bazel_tools//tools/jdk:bootclasspath"],
encoding = "iso-8859-1",
extclasspath = ["@bazel_tools//tools/jdk:extclasspath"],
forcibly_disable_header_compilation = 0,
genclass = ["@bazel_tools//tools/jdk:genclass"],
header_compiler = ["@bazel_tools//tools/jdk:turbine"],
ijar = ["@bazel_tools//tools/jdk:ijar"],
javabuilder = ["@bazel_tools//tools/jdk:javabuilder"],
javac = ["@bazel_tools//third_party/java/jdk/langtools:javac_jar"],
javac_supports_workers = 1,
jvm_opts = [
"-XX:+TieredCompilation",
"-XX:TieredStopAtLevel=1",
],
misc = [
"-XDskipDuplicateBridges=true",
],
singlejar = ["@bazel_tools//tools/jdk:SingleJar_deploy.jar"],
source_version = "8",
target_version = "8",
)
(you might want to change the singlejar target to the C++ binary for performance reason: @bazel_tools//tools/jdk:singlejar
IIRC)
Then you can point to that toolchain with --java_toolchain=//my:toolchain
(see the java_toolchain flag)