pattern-matching data-mining suffix-tree

Discover every pattern in arbitrary string and counting duplicates

I am wondering what is the best way to find patterns in an arbitrary string and count them to get the most common ones.

Basically, I have a time series that I translated into letters of a finite alphabet (lets assume 20 letters), creating a huge single string. What is the best way to find and count patterns? Parameters could be use to limit the amount of characters to search for as pattern, for instance, minimum of 4, maximum of 30 letters for pattern.

Are suffix trees an option? Or is there any data mining technique that can do this?

Solution

https://en.m.wikipedia.org/wiki/Sequential_pattern_mining

Sequential pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence.

You can then use fp-grpwth like algorithms.