Search code examples
javaalgorithmmatchingstring-matchingstring-search

String searching algorithms in Java


I am doing string matching with big amount of data.

EDIT: I am matching words contained in a big list with some ontology text files. I take each file from ontology, and search for a match between the third String of each file line and any word from the list.

I made a mistake in overseeing the fact that what I need to do is not pure matching (results are poor), but I need some looser matching function that will also return results when the string is contained inside another string.

I did this with a Radix Trie; it was very fast and works nice, but now I guess my work is useless because a trie returns only exact matches. :/

  • Type of algorithms that do this are string searching algorithms?
  • Can somebody suggest some Java implementations that he has experience with?

The algorithm should be fast, but is not top top priority, would compomise with speed & complexity.

I am very grateful for all advice/examples/explanations/links!

Thank you!


Solution

  • You might find Suffix Trees useful (they are similar in concept to Tries).

    Each string, you prepend with ^ and end with $ and create a suffix tree of all the strings appended. Space usage will be O(n) and will be probably worse than what you had for the trie.

    If you now need to search for a string s, you can easily do in O(|s|) time, just like a trie and the match you get will be a substring match (basically, you will be matching some suffix of some string).

    Sorry, I don't have a reference to a Java implementation handy.

    Found a useful stackoverflow answer: Generalized Suffix Tree Java Implementation

    Which has: http://illya-keeplearning.blogspot.com/2009/04/suffix-trees-java-ukkonens-algorithm.html

    Which in turn has: Source Code: http://illya.yolasite.com/resources/suffix-tree.zip