Search code examples
javavalidationutf-16

How to check if character is UTF-16


I would like to validate outgoing Strings. The rule is that they have to be UTF-16 and below.

How can you check if character is a valid UTF-16 char? Can I do it by some Java method or by regex?

Thanks for info


Solution

  • Any char in Java is always a valid UTF-16. But a sequence of chars may be invalid, that is in surrogate pairs (http://en.wikipedia.org/wiki/UTF-16) higher surrogate char must be followed by lower surrogate char. If this is what you mean then you can try this function

    static boolean isValidSequence(char[] a) {
        for (int i = 0; i < a.length; i++) {
            if (Character.isHighSurrogate(a[i])) {
                if (i < a.length - 1 && Character.isLowSurrogate(a[i + 1])) {
                    i++;
                } else {
                    return false;
                }
            } else if (Character.isLowSurrogate(a[i])) {
                return false;
            }
        }
        return true;
    }
    

    You can also take a look at this func http://www.java2s.com/Code/Java/Development-Class/ReturnscodetrueifthespecifiedcharactersequenceisavalidsequenceofUTF16charvalues.htm