Search code examples
algorithmpalindromerewriting

Efficiently constructing palindromes using string rewriting system


I am trying to solve the following challenge:

Given a string rewriting system with the following rules:

  1. c -> a c b
  2. c -> b a c a
  3. c -> b c b a b

Given a starting string c, find an efficient approach to constructing a palindrome.

Construction example with rule sequence (1, 1, 2):

c -(rule 1)-> acb -(rule 1)-> aacbb -(rule 2)-> aabacabb

My attempts:

  1. Generating permutations: I considered generating all permutations of a sequence of rewriting steps (n steps) but found it computationally expensive.
  2. Balancing approach: Rule 2 unbalances the string leftward, while rule 3 unbalances it rightward. Rule 1 is neutral. This led me to explore sequences where rule 2 is applied twice as often as rule 3 to maintain balance.
  3. Last rule constraint: Applying rule 1 last cannot create a palindrome, as it would introduce a non-palindromic trigram (acb) in the middle.

While balancing and last rule constraints reduce the search space, brute-forcing remains computationally intensive. Are there more structural approaches or tighter constraints to solve this problem efficiently?

I know that a solution to this within a sequence of 150 steps exists.


Solution

  • This comes down to searching a path in a graph, where the vertices are the current string of "a", "b" and "c", and the edges are the applications of one of the three transition rules.

    Some preliminary observations

    Not all rules lead to a state that can still become a palindrome, as also is apparent from the example that is given in the question. So those edges can be discarded. For example, from the very first state there is only one transition that still allows a palindrome to be formed, which is the rule c→bcbab. The other two rules would put different characters from the set ("a", "b") at the outer ends of the resulting string, which can never change again, and thus can never fit a palindrome.

    Once a transition is performed, the characters at the outside of the string that are equal can be discarded, as they will be part of the final palindrome, and can never change anyway. By this elimination we can unify vertices in the search tree, as only the non-eliminated characters are relevant for the rest of the actions.

    We will end up with a string that either starts with "c" or ends with "c", but it turns out that with these rules the "c" is always at the left. After the very first transition (where we have no choice), and the removal of the equal "b", the "c" is at the left of the resulting state: "cba". All transitions which are now possible, will always result in "c" at the left. Here are the patterns that we could get, and which rules can be applied. The asterisk represents any sequence of a and b (including no sequence at all):

    pattern (current state) rule result remainder (next state)
    c*a c→acb acb*a cb*
    cb c→baca bacab c (solved)
    c*ab c→baca baca*ab ca*
    c*b c→bcbab bcbab*b cbab*

    So we actually only need to look at the remaining characters at the right of this "c" as there will never be anything that remains (to be resolved) at the left of it.

    Solving the path-finding problem

    You could apply a breath-first search, where you push an extension (of a state to the next state) to a queue. But some transitions are more interesting than others. In case the state ends in a there is no choice, but when it is b there are potentially two rules to choose from (see above table). When the current string ends with "ab" we could apply rule c→baca (transition 3 in the above table) or rule c→bcbab (transition 4 in the above table). The first one is more interesting because it eliminates two characters from both sides, and only adds one to the remainder: this is the only transition that reduces the length of the remaining characters. The other one is less interesting, as it makes the remaining string longer.

    So we could improve a plain breadth-first search by giving some priority to states in the queue. A very basic improvement would be to use a deque, and put the state at the front when it is shorter than the state we came from (in terms of the "remaining" string), and at the back of the deque otherwise. This turns out to be a critical improvement.

    Implementation

    Here is an implementation in JavaScript:

    function getSequence() {
        // We use a "primitive" priority queue: a deque
        // A deque's entry has two parts:
        //   - an array with applied actions (i.e. rules as strings)
        //   - the string at the right side of "c", but in reverse (for easier manipulation)
        // At the start only rule "c→bcbab" works out, and the outer "b" can be removed.
        // So the deque will start with this "cba" as entry. We don't store the "c", as it is always at the left.
        //  and the remainder of the string is stored in reverse, so "ab":
        const deque = [ [["bcbab"], "ab"] ]; // After action [c→bcbab], we have a left-over of "cba" (reversed)
        const visited = new Set();
        for (let i = 1; i < 1e6; i++) { // Put a hard limit on the process
            const [seq, right] = deque.shift(); // Get a next state
            if (visited.has(right)) {
                continue; // Avoid processing the same state twice
            }
            visited.add(right);
            if (right == "b") {
                seq.push("baca"); // This rule solves it
                console.log(`Solved after ${i} iterations`);
                return seq;
            } else if (right[0] == "a") { 
                seq.push("acb");
                // With this rule we can cancel out "a", but add a "b"
                deque.push([seq, right.slice(1) + "b"]);
            } else { // right[0] == "b"
                if (right[1] == "a") {
                    // As we have two applicable rules, copy the current sequence (concat).
                    // With this rule we can cancel out "ba", but add "a" (the only case where we GAIN)
                    // Give priority by enqueuing at the left side of the deque
                    deque.unshift([seq.concat("baca"), right.slice(2) + "a"]);
                }
                seq.push("bcbab");
                // With this rule we can cancel out "b", but add "bab", making it longer :-(
                deque.push([seq, right.slice(1) + "bab"]);
            }
        }
        throw "Could not find a solution in a reasonable number of iterations";
    }
    
    function applySequence(seq) {
        console.log(`The solution consists of ${seq.length} actions:`);
        let s = "c";
        let palindrome = "c";
        for (let i = 0; i < seq.length; i++) {
            const action = seq[i];
            // Apply action 
            s = s.replace("c", action);
            palindrome = palindrome.replace("c", action);
            // Keep removing outer-side characters when they are equal 
            //   so to focus the output on what remains to be done.
            while (s.length > 0 && s[0] === s.at(-1)) {
                s = s.slice(1, -1);
            }
            console.log(`${i+1}. c → ${action}: ${s}`);
        }
        console.log("palindrome:", palindrome);
    }
    
    const seq = getSequence();
    applySequence(seq);

    Produced solution:

    Runnning the snippet you'll see it almost immediately prints out a solution that consists of 44 actions:

    1. c → bcbab: cba
    2. c → acb: cbb
    3. c → bcbab: cbabb
    4. c → bcbab: cbabbab
    5. c → baca: cababb
    6. c → bcbab: cbababab
    7. c → baca: cababab
    8. c → bcbab: cbabababa
    9. c → acb: cbbababab
    10. c → baca: cabbabab
    11. c → baca: caabbab
    12. c → baca: caaabb
    13. c → bcbab: cbabaaab
    14. c → baca: cababaa
    15. c → acb: cbababa
    16. c → acb: cbbabab
    17. c → baca: cabbab
    18. c → baca: caabb
    19. c → bcbab: cbabaab
    20. c → baca: cababa
    21. c → acb: cbabab
    22. c → baca: cabab
    23. c → baca: caab
    24. c → bcbab: cbabaa
    25. c → acb: cbbaba
    26. c → acb: cbbbab
    27. c → baca: cabbb
    28. c → bcbab: cbababb
    29. c → bcbab: cbabbabab
    30. c → baca: cababbab
    31. c → baca: caababb
    32. c → bcbab: cbabaabab
    33. c → baca: cababaab
    34. c → baca: caababa
    35. c → acb: cbaabab
    36. c → baca: cabaab
    37. c → baca: caaba
    38. c → acb: cbaab
    39. c → baca: caba
    40. c → acb: cbab
    41. c → baca: cab
    42. c → baca: ca
    43. c → acb: cb
    44. c → baca: 
    

    The so constructed palindrome is:

    babbbabbabababababbaaabababbaabababaababbbababbabaababaabaababaabacabaababaabaababaababbababbbab

    This is also the shortest solution in terms of number of applied rules:

    This is an A* algorithm

    We can prove that this solution is the shortest by showing that the proposed algorithm actually performs an A* search algorithm where the heuristic admissible cost is defined by the number of "b" characters in the remainder string. Add to that the already incurred cost, which is the path length of the already performed actions, and we have the lower-bound for the total cost. It turns out that the "interesting" transition (c→baca) is the only one that keeps that cost equal (the increase of the path length is countered by reduction of the "b" count). The other (non-finishing) two actions both increase that cost with 2. This means that at any time we only have two categories of states in terms of cost: those with the current cost, and those with a cost that is two units greater. And this is exactly what we achieve with the deque where we append the states that increase the cost, and prepend those that don't: the states are visited in order of A*-cost.

    The above algorithm could also be performed by a pure breadth-first search. For this you'd replace the unshift call with a push call putting the "interesting" state also at the end of the queue like all others.

    In that case the states will be visited in order of path-length, and we will also be sure to find the shortest path to the solution. With that change, the code will run for a few minutes (135 seconds on my laptop) and eventually output the same solution, as expected.

    This is not the only solution, as obviously you could continue from the solved state and repeat the same sequence again (applying 88 actions). But discarding that, there are also some other solutions you could find, like this one, which has 56 actions, taking a different branch at step 11:

    1. c → bcbab: cba
    2. c → acb: cbb
    3. c → bcbab: cbabb
    4. c → bcbab: cbabbab
    5. c → baca: cababb
    6. c → bcbab: cbababab
    7. c → baca: cababab
    8. c → bcbab: cbabababa
    9. c → acb: cbbababab
    10. c → baca: cabbabab
    11. c → bcbab: cbababbaba
    12. c → acb: cbbababbab
    13. c → baca: cabbababb
    14. c → bcbab: cbababbabab
    15. c → baca: cabababbab
    16. c → baca: caabababb
    17. c → bcbab: cbabaababab
    18. c → baca: cababaabab
    19. c → bcbab: cbabababaaba
    20. c → acb: cbbabababaab
    21. c → baca: cabbabababa
    22. c → acb: cbabbababab
    23. c → baca: cababbabab
    24. c → baca: caababbab
    25. c → bcbab: cbabaababba
    26. c → acb: cbbabaababb
    27. c → bcbab: cbabbbabaabab
    28. c → baca: cababbbabaab
    29. c → baca: caababbbaba
    30. c → acb: cbaababbbab
    31. c → baca: cabaababbb
    32. c → bcbab: cbababaababb
    33. c → bcbab: cbabbababaabab
    34. c → baca: cababbababaab
    35. c → baca: caababbababa
    36. c → acb: cbaababbabab
    37. c → baca: cabaababbab
    38. c → baca: caabaababb
    39. c → bcbab: cbabaabaabab
    40. c → baca: cababaabaab
    41. c → baca: caababaaba
    42. c → acb: cbaababaab
    43. c → baca: cabaababa
    44. c → acb: cbabaabab
    45. c → baca: cababaab
    46. c → baca: caababa
    47. c → acb: cbaabab
    48. c → baca: cabaab
    49. c → baca: caaba
    50. c → acb: cbaab
    51. c → baca: caba
    52. c → acb: cbab
    53. c → baca: cab
    54. c → baca: ca
    55. c → acb: cb
    56. c → baca: