Search code examples
language-agnosticunicodelocalizationinternationalizationtoupper

Reimplementing ToUpper()


How would you write ToUpper() if it didn't exist? Bonus points for i18n and L10n

Curiosity sparked by this: http://thedailywtf.com/Articles/The-Long-Way-toUpper.aspx


Solution

    1. I download the Unicode tables
    2. I import the tables into a database
    3. I write a method upper().

    Here is a sample implementation ;)

    public static String upper(String s) {
        if (s == null) {
            return null;
        }
    
        final int N = s.length(); // Mind the optimization!
        PreparedStatement stmtName = null;
        PreparedStatement stmtSmall = null;
        ResultSet rsName = null;
        ResultSet rsSmall = null;
        StringBuilder buffer = new StringBuilder (N); // Much faster than StringBuffer!
        try {
            conn = DBFactory.getConnection();
            stmtName = conn.prepareStatement("select name from unicode.chart where codepoint = ?");
            // TODO Optimization: Maybe move this in the if() so we don't create this
            // unless there are uppercase characters in the string.
            stmtSmall = conn.prepareStatement("select codepoint from unicode.chart where name = ?");
            for (int i=0; i<N; i++) {
                int c = s.charAt(i);
                stmtName.setInt(1, c);
                rsName = stmtName.execute();
                if (rsName.next()) {
                    String name = rsName.getString(1);
                    if (name.contains(" SMALL ")) {
                        name = name.replaceAll(" SMALL ", " CAPITAL ");
    
                        stmtSmall.setString(1, name);
                        rsSmall = stmtSmall.execute();
                        if (rsSmall.next()) {
                            c = rsSmall.getInt(1);
                        }
    
                        rsSmall = DBUtil.close(rsSmall);
                    }
                }
                rsName = DBUtil.close(rsName);
            }
        }
        finally {
            // Always clean up
            rsSmall = DBUtil.close(rsSmall);
            rsName = DBUtil.close(rsName);
            stmtSmall = DBUtil.close(stmtSmall);
            stmtName = DBUtil.close(stmtName);
        }
    
        // TODO Optimization: Maybe read the table once into RAM at the start
        // Would waste a lot of memory, though :/
        return buffer.toString();
    }
    

    ;)

    Note: The unicode charts which you can find on unicode.org contain the name of the character/code point. This string will contain " SMALL " for characters which are uppercase (mind the blanks or it might match "SMALLER" and the like). Now, you can search for a similar name with "SMALL" replaced with "CAPITAL". If you find it, you've found the captial version.