Search code examples
javastringjava-6

Understanding logic in CaseInsensitiveComparator


Can anyone explain the following code from String.java, specifically why there are three if statements (which I've marked //1, //2 and //3)?

private static class CaseInsensitiveComparator
                     implements Comparator<String>, java.io.Serializable {
// use serialVersionUID from JDK 1.2.2 for interoperability
private static final long serialVersionUID = 8575799808933029326L;

    public int compare(String s1, String s2) {
        int n1=s1.length(), n2=s2.length();
        for (int i1=0, i2=0; i1<n1 && i2<n2; i1++, i2++) {
            char c1 = s1.charAt(i1);
            char c2 = s2.charAt(i2);
            if (c1 != c2) {/////////////////////////1
                c1 = Character.toUpperCase(c1);
                c2 = Character.toUpperCase(c2);
                if (c1 != c2) {/////////////////////////2
                    c1 = Character.toLowerCase(c1);
                    c2 = Character.toLowerCase(c2);
                    if (c1 != c2) {/////////////////////////3
                        return c1 - c2;
                    }
                }
            }
        }
        return n1 - n2;
    }
}

Solution

  • Normally, we would expect to convert the case once and compare and be done with it. However, the code convert the case twice, and the reason is stated in the comment on a different method public boolean regionMatches(boolean ignoreCase, int toffset, String other, int ooffset, int len):

    Unfortunately, conversion to uppercase does not work properly for the Georgian alphabet, which has strange rules about case conversion. So we need to make one last check before exiting.


    Appendix

    The code of regionMatches has a few difference from the code in the CaseInsenstiveComparator, but essentially does the same thing. The full code of the method is quoted below for the purpose of cross-checking:

    public boolean regionMatches(boolean ignoreCase, int toffset,
                           String other, int ooffset, int len) {
        char ta[] = value;
        int to = offset + toffset;
        char pa[] = other.value;
        int po = other.offset + ooffset;
        // Note: toffset, ooffset, or len might be near -1>>>1.
        if ((ooffset < 0) || (toffset < 0) || (toffset > (long)count - len) ||
                (ooffset > (long)other.count - len)) {
            return false;
        }
        while (len-- > 0) {
            char c1 = ta[to++];
            char c2 = pa[po++];
            if (c1 == c2) {
                continue;
            }
            if (ignoreCase) {
                // If characters don't match but case may be ignored,
                // try converting both characters to uppercase.
                // If the results match, then the comparison scan should
                // continue.
                char u1 = Character.toUpperCase(c1);
                char u2 = Character.toUpperCase(c2);
                if (u1 == u2) {
                    continue;
                }
                // Unfortunately, conversion to uppercase does not work properly
                // for the Georgian alphabet, which has strange rules about case
                // conversion.  So we need to make one last check before
                // exiting.
                if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
                    continue;
                }
            }
            return false;
        }
        return true;
    }