Search code examples
pattern-matchingdata-miningsuffix-tree

Discover every pattern in arbitrary string and counting duplicates


I am wondering what is the best way to find patterns in an arbitrary string and count them to get the most common ones.

Basically, I have a time series that I translated into letters of a finite alphabet (lets assume 20 letters), creating a huge single string. What is the best way to find and count patterns? Parameters could be use to limit the amount of characters to search for as pattern, for instance, minimum of 4, maximum of 30 letters for pattern.

Are suffix trees an option? Or is there any data mining technique that can do this?


Solution

  • https://en.m.wikipedia.org/wiki/Sequential_pattern_mining

    Sequential pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence.

    You can then use fp-grpwth like algorithms.