string algorithm language-agnostic substring cycle

Find the dominant cyclic substring of a given string

I am looking for an algorithm to find the dominant cyclic substring of a given string.

A cyclic substring:

a substring which is repeated two or more times adjacently.

A dominant cyclic substring:

The substring which has the most adjacent repetitions is dominant
(adjacent repetitions occur an equal number of times)
- the longest length substring is dominant
(on ties of length and adjacent repetitions)
- the substring that appears first is dominant

Example 1:

prefixgarbagecyclecyclecyclecyclesufixgarbage
returns cycle:=> cycle is the most repeated adjacent substring

Example 2:

prefixgarbagecyclepadinggarbagecyclesufixgarbage
returns g:=> occurrences of cycle are not repeated adjacently, g repeats twice adjacently

Example 3:

prefixgarbagecyclecyclepadinggarbageprefixgarbage
returns cycle:=> cycle & g repeat twice adjacently but cycle is longer then g

Example 4:

prefixgarbagecyclecyclecycleroundroundroundprefixgarbage
returns cycle:=> cycle & round repeat thrice adjacently & same length but cycle appeared first

Exampe 5:

abcdefghijklmnopqrstuvqxyz
returns <empty string> because there is no repeated adjacent substring

What is the best approach for implementing this algorithm?

Solution

Cannot find anything better than this quadratic-time algorithm (implemented in Python):

IREP, IPER, IPOS = 0, 1, 2

def find_dominant(src):
  best = (0, 0, 0) # repetitions-1, period, position

  period = 0
  while period < len(src) // max(2, 1 + best[IREP]):
    period += 1
    length = 0

    for pos in range(len(src) - 1 - period, -1, -1):
      if src[pos] == src[pos + period]:
        length += 1
        repetitions = length // period
        if repetitions >= best[IREP]:
          best = (repetitions, period, pos)
      else:
        length = 0

  return best


s = "prefixgarbagecyclecyclecyclecyclesufixgarbage"
res = find_dominant(s)
if res[0] == 0:
  print("nothing found")
else:
  print(res[IREP] + 1, '*', s[res[IPOS]: res[IPOS] + res[IPER]])

For each possible period scan the string and remember the longest periodic subsequence. Scan it backwards to check less conditions. Stop increasing period when no further improvement could be found.

Time complexity is O(N² / R), where R is the number of repetitions of dominant substring. Space complexity is O(1).