Search code examples
javastringuppercase

Can Java String.toUpperCase() ever fail?


Situation: There is a Java ESB, which is taking input (family name) from a Vaadin web form, and should guarantee upper-casing it before saving it into DB.

I was assigned to investigate a reported issue, that lower-case characters sometimes appear in DB. I have learned, that the program is using String.toUpperCase() just before saving data through EntityManager (it is the only place that is modifying received data).

So what I wonder is, whether this shall be enough. So far I havent found any "well-known" problems related to toUpperCase() function, but I wanna be sure.

So the question - Does String.toUpperCase() always do its job? Or are there any possible characters or circumstances when error may occur and the letters may not be upper-cased?


Solution

  • Can Java String.toUpperCase() ever fail?

    It depends on whether you are passing in locale sensitive Strings (see below).


    In the implementation for Java.lang.String, it simply uses the default locale:

    public String toUpperCase() {
        return toUpperCase(Locale.getDefault());
    }
    

    toUpperCase(Locale) converts all of the characters in this String to upper case using the rules of the given Locale. Case mapping is based on the Unicode Standard version specified by the Character class. Since case mappings are not always 1:1 char mappings, the resulting String may be a different length than the original String.

    This method is locale sensitive, and may produce unexpected results if used for strings that are intended to be interpreted locale independently. Examples are programming language identifiers, protocol keys, and HTML tags.

    To obtain correct results for locale insensitive strings, use toUpperCase(Locale.ENGLISH).

    In case you are interested on how toUpperCase(Locale) was implemented:

    public String toUpperCase(Locale locale) {
        if (locale == null) {
            throw new NullPointerException();
        }
    
        int firstLower;
        final int len = value.length;
    
        /* Now check if there are any characters that need to be changed. */
        scan: {
            for (firstLower = 0 ; firstLower < len; ) {
                int c = (int)value[firstLower];
                int srcCount;
                if ((c >= Character.MIN_HIGH_SURROGATE)
                        && (c <= Character.MAX_HIGH_SURROGATE)) {
                    c = codePointAt(firstLower);
                    srcCount = Character.charCount(c);
                } else {
                    srcCount = 1;
                }
                int upperCaseChar = Character.toUpperCaseEx(c);
                if ((upperCaseChar == Character.ERROR)
                        || (c != upperCaseChar)) {
                    break scan;
                }
                firstLower += srcCount;
            }
            return this;
        }
    
        /* result may grow, so i+resultOffset is the write location in result */
        int resultOffset = 0;
        char[] result = new char[len]; /* may grow */
    
        /* Just copy the first few upperCase characters. */
        System.arraycopy(value, 0, result, 0, firstLower);
    
        String lang = locale.getLanguage();
        boolean localeDependent =
                (lang == "tr" || lang == "az" || lang == "lt");
        char[] upperCharArray;
        int upperChar;
        int srcChar;
        int srcCount;
        for (int i = firstLower; i < len; i += srcCount) {
            srcChar = (int)value[i];
            if ((char)srcChar >= Character.MIN_HIGH_SURROGATE &&
                (char)srcChar <= Character.MAX_HIGH_SURROGATE) {
                srcChar = codePointAt(i);
                srcCount = Character.charCount(srcChar);
            } else {
                srcCount = 1;
            }
            if (localeDependent) {
                upperChar = ConditionalSpecialCasing.toUpperCaseEx(this, i, locale);
            } else {
                upperChar = Character.toUpperCaseEx(srcChar);
            }
            if ((upperChar == Character.ERROR)
                    || (upperChar >= Character.MIN_SUPPLEMENTARY_CODE_POINT)) {
                if (upperChar == Character.ERROR) {
                    if (localeDependent) {
                        upperCharArray =
                                ConditionalSpecialCasing.toUpperCaseCharArray(this, i, locale);
                    } else {
                        upperCharArray = Character.toUpperCaseCharArray(srcChar);
                    }
                } else if (srcCount == 2) {
                    resultOffset += Character.toChars(upperChar, result, i + resultOffset) - srcCount;
                    continue;
                } else {
                    upperCharArray = Character.toChars(upperChar);
                }
    
                /* Grow result if needed */
                int mapLen = upperCharArray.length;
                if (mapLen > srcCount) {
                    char[] result2 = new char[result.length + mapLen - srcCount];
                    System.arraycopy(result, 0, result2, 0, i + resultOffset);
                    result = result2;
                }
                for (int x = 0; x < mapLen; ++x) {
                    result[i + resultOffset + x] = upperCharArray[x];
                }
                resultOffset += (mapLen - srcCount);
            } else {
                result[i + resultOffset] = (char)upperChar;
            }
        }
        return new String(result, 0, len + resultOffset);
    }