Search code examples
javaandroidoptimizationstringstringbuilder

How to trim a java stringbuilder?


I have a StringBuilder object that needs to be trimmed (i.e. all whitespace chars /u0020 and below removed from either end).

I can't seem to find a method in string builder that would do this.

Here's what I'm doing now:

String trimmedStr = strBuilder.toString().trim();

This gives exactly the desired output, but it requires two Strings to be allocated instead of one. Is there a more efficient to trim the string while it's still in the StringBuilder?


Solution

  • You should not use the deleteCharAt approach.

    As Boris pointed out, the deleteCharAt method copies the array over every time. The code in the Java 5 that does this looks like this:

    public AbstractStringBuilder deleteCharAt(int index) {
        if ((index < 0) || (index >= count))
            throw new StringIndexOutOfBoundsException(index);
        System.arraycopy(value, index+1, value, index, count-index-1);
        count--;
        return this;
    }
    

    Of course, speculation alone is not enough to choose one method of optimization over another, so I decided to time the 3 approaches in this thread: the original, the delete approach, and the substring approach.

    Here is the code I tested for the orignal:

    public static String trimOriginal(StringBuilder sb) {
        return sb.toString().trim();
    }
    

    The delete approach:

    public static String trimDelete(StringBuilder sb) {
        while (sb.length() > 0 && Character.isWhitespace(sb.charAt(0))) {
            sb.deleteCharAt(0);
        }
        while (sb.length() > 0 && Character.isWhitespace(sb.charAt(sb.length() - 1))) {
            sb.deleteCharAt(sb.length() - 1);
        }
        return sb.toString();
    }
    

    And the substring approach:

    public static String trimSubstring(StringBuilder sb) {
        int first, last;
    
        for (first=0; first<sb.length(); first++)
            if (!Character.isWhitespace(sb.charAt(first)))
                break;
    
        for (last=sb.length(); last>first; last--)
            if (!Character.isWhitespace(sb.charAt(last-1)))
                break;
    
        return sb.substring(first, last);
    }
    

    I performed 100 tests, each time generating a million-character StringBuffer with ten thousand trailing and leading spaces. The testing itself is very basic, but it gives a good idea of how long the methods take.

    Here is the code to time the 3 approaches:

    public static void main(String[] args) {
    
        long originalTime = 0;
        long deleteTime = 0;
        long substringTime = 0;
    
        for (int i=0; i<100; i++) {
    
            StringBuilder sb1 = new StringBuilder();
            StringBuilder sb2 = new StringBuilder();
            StringBuilder sb3 = new StringBuilder();
    
            for (int j=0; j<10000; j++) {
                sb1.append(" ");
                sb2.append(" ");
                sb3.append(" ");
            }
            for (int j=0; j<980000; j++) {
                sb1.append("a");
                sb2.append("a");
                sb3.append("a");
            }
            for (int j=0; j<10000; j++) {
                sb1.append(" ");
                sb2.append(" ");
                sb3.append(" ");
            }
    
            long timer1 = System.currentTimeMillis();
            trimOriginal(sb1);
            originalTime += System.currentTimeMillis() - timer1;
    
            long timer2 = System.currentTimeMillis();
            trimDelete(sb2);
            deleteTime += System.currentTimeMillis() - timer2;
    
            long timer3 = System.currentTimeMillis();
            trimSubstring(sb3);
            substringTime += System.currentTimeMillis() - timer3;
        }
    
        System.out.println("original:  " + originalTime + " ms");
        System.out.println("delete:    " + deleteTime + " ms");
        System.out.println("substring: " + substringTime + " ms");
    }
    

    I got the following output:

    original:  176 ms
    delete:    179242 ms
    substring: 154 ms
    

    As we see, the substring approach provides a very slight optimization over the original "two String" approach. However, the delete approach is extremely slow and should be avoided.

    So to answer your question: you are fine trimming your StringBuilder the way you suggested in the question. The very slight optimization that the substring method offers probably does not justify the excess code.