Search code examples
algorithmparentheses

Find the minimum number of edits to balance parentheses?


I was very confused about this question. I know about finding the edit distance between 2 strings using recursion and dynamic programming as an improvement, however am confused about how to go with this one.

Not sure if my thinking is correct. But we have a string of parenthesis which is unbalanced say

String s = "((())))";

How to find the String with balanced Parenthesis which requires minimum number of edits ?

Can some one explain this with an example ?

I am still not sure if I am explaining it correctly.


Solution

  • Given a string consisting of left and right parentheses, we are asked to balance it by performing a minimal number of delete, insert, and replace operations.

    To begin with, let's look at the input string and distinguish matched pairs from unmatched characters. We can mark all the characters belonging to matched pairs by executing the following algorithm:

    1. Find an unmarked '(' that is followed by an unmarked ')', with zero or more marked characters between the two.
    2. If there is no such pair of characters, terminate the algorithm.
    3. Otherwise, mark the '(' and the ')'.
    4. Return to step 1.

    The marked pairs are already balanced at zero cost, so the optimal course of action is to do nothing further with them.

    Now let's consider the unmarked characters. Notice that no unmarked '(' is followed by an unmarked ')', or else the pair would have been marked. Therefore, if we scan the unmarked characters from left to right, we will find zero or more ')' characters followed by zero or more '(' characters.

    To balance the sequence of ')' characters, it is optimal to rewrite every other one to '(', starting with the first one and excluding the last one. If there is an odd number of ')' characters, it is optimal to delete the last one.

    As for the sequence of '(' characters, it is optimal to rewrite every other one to ')', starting with the second one. If there is a leftover '(' character, we delete it. The following Python code implements the steps described above and displays the intermediate results.

    def balance(s):  # s is a string of '(' and ')' characters in any order
      n = len(s)
      print('original string: %s' % s)
    
      # Mark all matched pairs
      marked = n * [ False ]
      left_parentheses = []
      for i, ch in enumerate(s):
        if ch == '(':
          left_parentheses.append(i)
        else:
          if len(left_parentheses) != 0:
            marked[i] = True
            marked[left_parentheses.pop()] = True
    
      # Display the matched pairs and unmatched characters.
      matched, remaining = [], []
      for i, ch in enumerate(s):
        if marked[i]:
          matched.append(ch)
          remaining.append(' ')
        else:
          matched.append(' ')
          remaining.append(ch)
      print('  matched pairs: %s' % ''.join(matched))
      print('      unmatched: %s' % ''.join(remaining))
    
      cost = 0
      deleted = n * [ False ]
      new_chars = list(s)
    
      # Balance the unmatched ')' characters.
      right_count, last_right = 0, -1
      for i, ch in enumerate(s):
        if not marked[i] and ch == ')':
          right_count += 1
          if right_count % 2 == 1:
            new_chars[i] = '('
            cost += 1
            last_right = i
      if right_count % 2 == 1:      # Delete the last ')' if we couldn't match it.
        deleted[last_right] = True  # The cost was incremented during replacement.
    
      # Balance the unmatched '(' characters.
      left_count, last_left = 0, -1
      for i, ch in enumerate(s):
        if not marked[i] and ch == '(':
          left_count += 1
          if left_count % 2 == 0:
            new_chars[i] = ')'
            cost += 1
          else:
            last_left = i
      if left_count % 2 == 1:      # Delete the last '(' if we couldn't match it.
        deleted[last_left] = True  # This character wasn't replaced, so we must
        cost += 1                  # increment the cost now.
    
      # Display the outcome of replacing and deleting.
      balanced = []
      for i, ch in enumerate(new_chars):
        if marked[i] or deleted[i]:
          balanced.append(' ')
        else:
          balanced.append(ch)
      print('        balance: %s' % ''.join(balanced))
    
      # Display the cost of balancing and the overall balanced string.
      print('           cost: %d' % cost)
      result = []
      for i, ch in enumerate(new_chars):
        if not deleted[i]:  # Skip deleted characters.
          result.append(ch)
      print('     new string: %s' % ''.join(result))
    
    
    balance(')()(()())))()((())((')
    

    For the test case ')()(()())))()((())((', the output is as follows.

    original string: )()(()())))()((())((
      matched pairs:  ()(()())  () (())
          unmatched: )        ))  (    ((
            balance: (        )   (    )
               cost: 4
         new string: (()(()()))()((()))