Search code examples
pythonalgorithmpalindrome

Build Palindrome from two strings


I want to write a python function that does this efficiently:

The function will take two strings, 'a' and 'b', and attempt to find the longest palindromic string that can be formed such that it is a concatenation of a non-empty substring of 'a' and a non-empty substring of 'b'. If there are multiple valid answers, it will return the lexicographically smallest one. If no such string can be formed, it will return '-1'.

I have an inefficient solution that generates all the substrings of both strings, and then creates all possible concatenations whle tracking the longest which is a valid palindrome:

def is_palindrome(word):
    """Check if a word is a palindrome."""
    reversed_word = word[::-1]
    return word == reversed_word


def all_substrings_of_word(word):
    """Generate all possible non-empty substrings of a given string."""
    substrings = []
    for sub_string_length in range(1, len(word) + 1):
        for i in range(len(word) - sub_string_length + 1):
            new_word = word[i:i + sub_string_length]
            substrings.append(new_word)
    return substrings


def buildPalindrome(a, b):
    """Attempt to find the longest palindromic string created by concatenating
    a substring of `a` with a substring of `b`."""
    sub_strings_a = all_substrings_of_word(a)
    sub_strings_b = all_substrings_of_word(b)

    # Generate all possible concatenations of substrings from `a` and `b`
    multiplexed_array = [
        word_a + word_b for word_a in sub_strings_a for word_b in sub_strings_b]

    # Find the best palindrome (longest, then lexicographically smallest)
    best_palindrome = ""
    for word in multiplexed_array:
        if is_palindrome(word):
            if len(word) > len(best_palindrome):
                best_palindrome = word
            elif len(word) == len(best_palindrome) and word < best_palindrome:
                best_palindrome = word

    return best_palindrome if best_palindrome else "-1"

print(buildPalindrome("bac", "bac"))   # EXPECTED OUTPUT -- aba
print(buildPalindrome("abc", "def"))   # EXPECTED OUTPUT -- -1
print(buildPalindrome("jdfh", "fds"))   # EXPECTED OUTPUT -- dfhfd

Can I please get an explanation on how this can be improved?


Solution

  • You could take this approach:

    • Build a trie for all substrings in b. A suffix tree would be even better, as it is more efficient.

    • Consider all possible "centers" for potential palindromes in the string a. So these can be between two consecutive characters (when palindrome has an even size) or on a character (when palindrome has an odd size). For each of these centers do:

      • Find the largest palindrome p at that center only considering string a

      • extend p to the left as long as the added characters (in the order of being added) are a word in the trie of b. This is a potential solution. Compare it with the longest palindrome so far to retain the longest.

      • If it is not possible to extend p in this way, then shorten p until a character is so removed that exists in b. In that case we have a potential solution.

      • If in the latter case there are no characters in p that occur in b, then we have no suitable palindrome at the chosen center.

    Then turn the tables and apply the above procedure where a becomes the reversal of b, and b the reversal of a. This practically means we search for palindrome centers in the original b.

    Here is an implementation of that idea:

    # Of the two given strings choose the one that is longest, or if equal, comes first in lexical order
    def longest(x, y):
        return min((-len(x), x), (-len(y), y))[1]
    
    def buildTrie(s):
        trie = {}
        for i in range(len(s)):
            node = trie
            for j in range(i, len(s)):
                node = node.setdefault(s[j], {})
        return trie
    
    def buildPalindromeTrincot(s1, s2):
        palin = ""
        # Two passes: one for where the center of a palindrome is in s1, one for the reverse case
        for a, b in ((s1, s2), (s2[::-1], s1[::-1])):
            # Build a trie for B substrings (could be suffixtree for better performance)
            trie = buildTrie(b)
            n = len(a)
            # Visit all possible centers in A for a potential solution of at least 2 characters 
            for center in range(2*n-1, 0, -1):
                # Get the offsets of the innermost characters that must be equal 
                #   for a palindrome of at least two characters
                mid1 = (center - 1)//2
                mid2 = (center + 2)//2
                # Get largest palindrome at this center in A alone: 
                #   `left` will point to the left-neighboring character to that palindrome
                left = next((left for left, right in zip(range(mid1, 0, -1), range(mid2, n)) 
                             if a[left] != a[right]), 
                            max(0, mid1 + mid2 - n))
                # Must extend the palindrome with a substring from B
                node = trie.get(a[left], None)
                if node is not None:  # We can extend the palindrome using B
                    for left in range(left-1, -1, -1):
                        if a[left] not in node:
                            left += 1
                            break
                        node = node[a[left]]                    
                else: 
                    # See if we can drop characters from the palindrome in A 
                    #    until we can replace one with the same character from B
                    left = next((left for left in range(left+1, mid1+1) if a[left] in trie), None)
                    if left is None:
                        continue  # No solution found here
                palin = longest(a[left:mid2] + a[left:mid1+1][::-1], palin)
                    
        return palin or "-1"
    

    For input strings of around 40, this implementation runs 100 times faster than the original code you provided. For inputs of size 70, that becomes a factor of 1000 times. For inputs with strings of size 500, this implementation returns an answer in less than a second.

    There is still room for improvement, like:

    • Using a suffix tree instead of a trie
    • Exiting early when the future palindromes can never be longer than the one already found
    • Have the center move from the center of a+b and "fan" outward, so that a larger palindrome is found sooner. For this you'll need to have both tries built first as you'll toggle between one and the other.

    But as the above code already brought a dramatic improvement, I didn't pursue any of these improvements.