Search code examples
javaperformance

Fastest way to check a string is alphanumeric in Java


What is the fastest way to check that a String contains only alphanumeric characters.

I have a library for processing extremely large data files, that is CPU bound. I am explicitly looking for information about how to improve the performance of the alphanumeric checking process. I am wondering if there is a more efficient way to check than using pre-compiled regular expressions.


Solution

  • I've written the tests that compare using regular expressions (as per other answers) against not using regular expressions. Tests done on a quad core OSX10.8 machine running Java 1.6

    Interestingly using regular expressions turns out to be about 5-10 times slower than manually iterating over a string. Furthermore the isAlphanumeric2() function is marginally faster than isAlphanumeric(). One supports the case where extended Unicode numbers are allowed, and the other is for when only standard ASCII numbers are allowed.

    public class QuickTest extends TestCase {
    
        private final int reps = 1000000;
    
        public void testRegexp() {
            for(int i = 0; i < reps; i++)
                ("ab4r3rgf"+i).matches("[a-zA-Z0-9]");
        }
    
    public void testIsAlphanumeric() {
        for(int i = 0; i < reps; i++)
            isAlphanumeric("ab4r3rgf"+i);
    }
    
    public void testIsAlphanumeric2() {
        for(int i = 0; i < reps; i++)
            isAlphanumeric2("ab4r3rgf"+i);
    }
    
        public boolean isAlphanumeric(String str) {
            for (int i=0; i<str.length(); i++) {
                char c = str.charAt(i);
                if (!Character.isLetterOrDigit(c))
                    return false;
            }
    
            return true;
        }
    
        public boolean isAlphanumeric2(String str) {
            for (int i=0; i<str.length(); i++) {
                char c = str.charAt(i);
                if (c < 0x30 || (c >= 0x3a && c <= 0x40) || (c > 0x5a && c <= 0x60) || c > 0x7a)
                    return false;
            }
            return true;
        }
    
    }