I have sets of texts. The texts are rather short (up to 500 characters). The sets consist of up to 10 texts. I try to find phrases as long as possible which occur in most of the texts. In other words I am looking for identical substrings. The longer the better and the more texts they occur in also the better.
Example:
Phrases (one word phrases ommitted):
Phrases like "brown fox" or "fox jumps" are omitted, because they are subphrases of longer phrases.
I am looking for an algorithm to find those phrases.
Finding the longuest commun substring is a commun DP algorithm explained pretty well here. https://www.geeksforgeeks.org/longest-common-substring-dp-29/ . After to find the occurence of the strings in the set of text you can simply use a code loke this.
substring = "red brown fox"
n = 0
for text in texts:
if substring in text:
n = n + 1
print(substring, n)