Search code examples
javastringencodingjava-native-interface

"String Compact" has introduced some performance issues for the current JNI, How to avoid


To minimize copy operations and directly access the string content of String, JNI provides GetStringCritical, which has always worked well. However, with the introduction of "String Compact" in Java, the string encoding could either be Latin1 or UTF16. In English-speaking regions, the vast majority of characters can be represented using Latin1, so most String objects internally use Latin1 storage. This creates an awkward situation when using GetStringCritical, as the contained string will be forcibly transcoded to UTF16, implying a definite copy operation.

Is there still a way to directly access the string content on the C++ side? Do we really have to resort to reflection to retrieve the byte[] value from the String?


Solution

  • Is there still a way to directly access the string content on the C++ side?

    Not in a safe and portable way. If you tried to access the string content direct in C++ (i.e. without using GetString or GetCriticalString), you could get unlucky and find that the GC moves the heap node containing content it while you are accessing it.

    Do we really have to resort to reflection to retrieve the byte[] value from the String?

    That would be a bad idea. It would be non-portable ... and using reflection from JNI/C++ is liable to be more expensive than from Java.

    Note that sometime between Java 11 and Java 17, the C++ implementation of JNI GetStringCritical changed so that it now always copies the characters.

    I haven't researched why they changed this, but if you cared to put in the effort, there could be clues in the commit messages and associated issues in the issue trackers. The code is in "src/hotspot/share/prims/jni.cpp" or "hotspot/src/share/prims/jni.cpp" depending on the OpenJDK version.

    In my opinion, you are better off not trying to optimize this. If you do optimize it, you will find that you need to optimize it differently for different Java versions.