Search code examples
javaregexsearchsubstringsearch-engine

what is the fastest substring search method in Java


I need to implement a way to search substring (needles) in a list of string (haystack) using Java.

More specifically, my app has a list of user profiles. If I type some letters, for example, "Ja", and then search, then all the users whose name contains "ja" should show up. For instance, the result could be "Jack", "Jackson", "Jason", "Dijafu".

In Java, as I know, there are 3 build-in method to see search substring in a string.

  1. string.contains()

  2. string.indexOf()

  3. regular expression. it is something like string.matches("ja"))

My question is: What are the runtimes of each method above? which one is the fastest or most efficient or most popular way to check if the list of string contains a given substring.

I know there exists some algorithms that do the same thing, such as Boyer–Moore string search algorithm, Knuth–Morris–Pratt algorithm and so on. I do not want to use them because I just have a small list of strings, and I think using them is kind of overkill for me right now. Also I have to type a lot of extra coding for such a non-build-in algorithm. If you think my thoughts is not correct, please feel free to correct me.


Solution

  • String[] names = new String[]{"jack", "jackson", "jason", "dijafu"};
    long start = 0;
    long stop = 0;
    
    //Contains
    start = System.nanoTime();
    for (int i = 0; i < names.length; i++){
        names[i].contains("ja");
    }
    stop = System.nanoTime();
    System.out.println("Contains: " + (stop-start));
    
    //IndexOf
    start = System.nanoTime();
    for (int i = 0; i < names.length; i++){
        names[i].indexOf("ja");
    }
    stop = System.nanoTime();
    System.out.println("IndexOf: " + (stop-start));
    
    //Matches
    start = System.nanoTime();
    for (int i = 0; i < names.length; i++){
        names[i].matches("ja");
    }
    stop = System.nanoTime();
    System.out.println("Matches: " + (stop-start));
    

    Output:

    Contains: 16677
    IndexOf: 4491
    Matches: 864018