Java native code string ending

Does the string returned from the GetStringUTFChars() end with a null terminated character? Or do I need to determine the length using GetStringUTFLength and null terminate it myself?

Solution

This answer is obsolete, and ceztko's answer (below) should now be preferred over mine. Though I'm keeping it here for historical purposes.

JNI now offers both a pointer to the UTF-8 characters and the length of the character array, and indicates that you should not rely on the fact that UTF-8 buffers are NUL terminated. (Though this may be an implementation detail of a particular JVM, one should not rely on it to be portable, and it may even change with future versions of the JVM you're using.)

Original answer:

Yes, GetStringUTFChars returns a null-terminated string. However, I don't think you should take my word for it, instead you should find an authoritative online source that answers this question.

Let's start with the actual Java Native Interface Specification itself, where it says:

Returns a pointer to an array of bytes representing the string in modified UTF-8 encoding. This array is valid until it is released by ReleaseStringUTFChars().

Oh, surprisingly it doesn't say whether it's null-terminated or not. Boy, that seems like a huge oversight, and fortunately somebody was kind enough to log this bug on Sun's Java bug database back in 2008. The notes on the bug point you to a similar but different documentation bug (which was closed without action), which suggests that the readers buy a book, "The Java Native Interface: Programmer's Guide and Specification" as there's a suggestion that this become the new specification for JNI.

But we're looking for an authoritative online source, and this is neither authoritative (it's not yet the specification) nor online.

Fortunately, the reviews for said book on a certain popular online book retailer suggest that the book is freely available online from Sun, and that would at least satisfy the online portion. Sun's JNI web page has a link that looks tantalizingly close, but that link sadly doesn't go where it says it goes.

So I'm afraid I cannot point you to an authoritative online source for this, and you'll have to buy the book (it's actually a good book), where it will explain to you that:

UTF-8 strings are always terminated with the '\0' character, whereas Unicode strings are not. To find out how many bytes are needed to represent a jstring in the UTF-8 format, JNI programmers can either call the ANSI C function strlen on the result of GetStringUTFChars, or call the JNI function GetStringUTFLength on the jstring reference directly.

(Note that in the above sentence, "Unicode" means "UTF-16", or more accurately "the internal two-byte string representation used by Java, though finding proof of that is left as an exercise for the reader.)