Search code examples
javacollectionspartialstring-matching

Fastest way to find Strings in String collection that begin with certain chars


I have a large collection of Strings. I want to be able to find the Strings that begin with "Foo" or the Strings that end with "Bar". What would be the best Collection type to get the fastest results? (I am using Java)

I know that a HashSet is very fast for complete matches, but not for partial matches I would think? So, what could I use instead of just looping through a List? Should I look into LinkedList's or similar types? Are there any Collection Types that are optimized for this kind of queries?


Solution

  • The best collection type for this problem is SortedSet. You would need two of them in fact:

    1. Words in regular order.
    2. Words with their characters inverted.

    Once these SortedSets have been created, you can use method subSet to find what you are looking for. For example:

    1. Words starting with "Foo":

       forwardSortedSet.subSet("Foo","Fop");
      
    2. Words ending with "Bar":

       backwardSortedSet.subSet("raB","raC");
      

    The reason we are "adding" 1 to the last search character is to obtain the whole range. The "ending" word is excluded from the subSet, so there is no problem.

    EDIT: Of the two concrete classes that implement SortedSet in the standard Java library, use TreeSet. The other (ConcurrentSkipListSet) is oriented to concurrent programs and thus not optimized for this situation.