Search code examples
javastringparsingsplitstringtokenizer

Most efficient way of splitting String in Java


For the sake of this question, let's assume I have a String which contains the values Two;.Three;.Four (and so on) but the elements are separated by ;..

Now I know there are multiple ways of splitting a string such as split() and StringTokenizer (being the faster one and works well) but my input file is around 1GB and I am looking for something slightly more efficient than StringTokenizer.

After some research, I found that indexOf and substring are quite efficient but the examples only have single delimiters or results are returning only a single word/element.

Sample code using indexOf and substring:

String s = "quick,brown,fox,jumps,over,the,lazy,dog";
int from = s.indexOf(',');
int to = s.indexOf(',', from+1);
String brown = s.substring(from+1, to);

The above works for printing brown but how can I use indexOf and substring to split a line with multiple delimiters and display all the items as below.

Expected output

Two
Three
Four
....and so on

Solution

  • StringTokenizer is faster than StringBuilder.

    public static void main(String[] args) {
    
        String str = "This is String , split by StringTokenizer, created by me";
        StringTokenizer st = new StringTokenizer(str);
    
        System.out.println("---- Split by space ------");
        while (st.hasMoreElements()) {
            System.out.println(st.nextElement());
        }
    
        System.out.println("---- Split by comma ',' ------");
        StringTokenizer st2 = new StringTokenizer(str, ",");
    
        while (st2.hasMoreElements()) {
            System.out.println(st2.nextElement());
        }
    }