Search code examples
character-encodingjavaciso-8859-1bazel

Bazel: java_library character encoding via javacopts not working?


I have an external source repository that apparently stores some source files using ISO-8859-1 character encoding. I am having trouble getting javac to change from default UTF-8 to ISO-8859-1 when invoked through Bazel.

I'm fetching the external repository via Bazel and can determine the charset of the fetched files:

> cd bazel-PROJECT/external/third-party/src
> file -i LibraryCode.java 
LibraryCode.java: text/x-c; charset=iso-8859-1

Building the external sources via Bazel's java_library, or attempting to compile external repository source files via javac directly from the command line fails with (expected):

error: unmappable character for encoding UTF8

Attempting to use javac's -encoding argument solves the compilation issue when used from the command line against the external repository files fetched by Bazel:

> javac -encoding iso-8859-1 LibraryCode.java

However, I've been unable to successfully pass the -encoding option to javac via Bazel.

I've tried so far:

  1. setting the javacopts in java_library rule
  2. setting --javacopt from Bazel's command line
  3. declaring java_toolchain rule with encoding ISO-8859-1 and using it with --java_toolchain from Bazel's command line.

None of these attempts got around the charset mismatch and compiler error.

1) repository_rule build_file: thirdparty.BUILD

java_library(
  name = "thirdparty",
  srcs = glob(["src/**/*.java"]),
  javacopts = ["-encoding iso-8859-1"],
  visibility = ["//visibility:public"]
)

2) Bazel command line:

> bazel build --javacopt="-encoding iso-8859-1" target 

3) Defining Java toolchain target with encoding setting:

java_toolchain(
  name = "toolchain",
  bootclasspath = ["@bazel_tools//tools/jdk:bootclasspath"],
  encoding = "iso-8859-1",
  extclasspath = ["@bazel_tools//tools/jdk:extdir"],
  forcibly_disable_header_compilation = 0,
  genclass = ["@bazel_tools//tools/jdk:GenClass_deploy.jar"],
  header_compiler = ["@bazel_tools//tools/jdk:turbine_deploy.jar"],
  ijar = ["@bazel_tools//tools/jdk:ijar"],
  javabuilder = ["@bazel_tools//tools/jdk:JavaBuilder_deploy.jar"],
  javac = ["@bazel_tools//third_party/java/jdk/langtools:javac_jar"],
  javac_supports_workers = 1,
  jvm_opts = [
    "-XX:+TieredCompilation",
    "-XX:TieredStopAtLevel=1",
  ],
  misc = [
    "-XDskipDuplicateBridges=true",
  ],
  singlejar = ["@bazel_tools//tools/jdk:SingleJar_deploy.jar"],
  source_version = "8",
  target_version = "8",

  visibility = ["//visibility:public"]
)

All end up with error: unmappable character for encoding UTF8.

What is the mistake I am making in setting the javac encoding via Bazel?

I can try to work around the issue by converting the external repository source files via iconv but I'd prefer solving it via javac's encoding setting as intended.

Follow Up

The java_toolchain encoding not getting recognized appears to be a bug. I've a preliminary fix for this on my local copy of Bazel -- java_toolchain approach to changing charset (option #3 above) appears to work.

Tracking this issue and a proposed fix in: #2926


Solution

  • Unfortunately there is no good way to do that from the command line / target based. You have to write a java_toolchain and point to it. Deriving the one from bazel that would result in:

    java_toolchain(
        name = "toolchain",
        bootclasspath = ["@bazel_tools//tools/jdk:bootclasspath"],
        encoding = "iso-8859-1",
        extclasspath = ["@bazel_tools//tools/jdk:extclasspath"],
        forcibly_disable_header_compilation = 0,
        genclass = ["@bazel_tools//tools/jdk:genclass"],
        header_compiler = ["@bazel_tools//tools/jdk:turbine"],
        ijar = ["@bazel_tools//tools/jdk:ijar"],
        javabuilder = ["@bazel_tools//tools/jdk:javabuilder"],
        javac = ["@bazel_tools//third_party/java/jdk/langtools:javac_jar"],
        javac_supports_workers = 1,
        jvm_opts = [
            "-XX:+TieredCompilation",
            "-XX:TieredStopAtLevel=1",
        ],
        misc = [
            "-XDskipDuplicateBridges=true",
        ],
        singlejar = ["@bazel_tools//tools/jdk:SingleJar_deploy.jar"],
        source_version = "8",
        target_version = "8",
    )
    

    (you might want to change the singlejar target to the C++ binary for performance reason: @bazel_tools//tools/jdk:singlejar IIRC)

    Then you can point to that toolchain with --java_toolchain=//my:toolchain (see the java_toolchain flag)