I am looking for an algorithm to find the dominant cyclic substring of a given string.
A cyclic substring:
A dominant cyclic substring:
Example 1:
prefixgarbagecyclecyclecyclecyclesufixgarbage
cycle
:=> cycle
is the most repeated adjacent substringExample 2:
prefixgarbagecyclepadinggarbagecyclesufixgarbage
g
:=> occurrences of cycle
are not repeated adjacently, g
repeats twice adjacentlyExample 3:
prefixgarbagecyclecyclepadinggarbageprefixgarbage
cycle
:=> cycle
& g
repeat twice adjacently but cycle
is longer then g
Example 4:
prefixgarbagecyclecyclecycleroundroundroundprefixgarbage
cycle
:=> cycle
& round
repeat thrice adjacently & same length but cycle
appeared firstExampe 5:
abcdefghijklmnopqrstuvqxyz
<empty string>
because there is no repeated adjacent substringWhat is the best approach for implementing this algorithm?
Cannot find anything better than this quadratic-time algorithm (implemented in Python):
IREP, IPER, IPOS = 0, 1, 2
def find_dominant(src):
best = (0, 0, 0) # repetitions-1, period, position
period = 0
while period < len(src) // max(2, 1 + best[IREP]):
period += 1
length = 0
for pos in range(len(src) - 1 - period, -1, -1):
if src[pos] == src[pos + period]:
length += 1
repetitions = length // period
if repetitions >= best[IREP]:
best = (repetitions, period, pos)
else:
length = 0
return best
s = "prefixgarbagecyclecyclecyclecyclesufixgarbage"
res = find_dominant(s)
if res[0] == 0:
print("nothing found")
else:
print(res[IREP] + 1, '*', s[res[IPOS]: res[IPOS] + res[IPER]])
For each possible period scan the string and remember the longest periodic subsequence. Scan it backwards to check less conditions. Stop increasing period when no further improvement could be found.
Time complexity is O(N2 / R), where R is the number of repetitions of dominant substring. Space complexity is O(1).