Search code examples
algorithmcomputer-science

Longest substring where every character appear even number of times (possibly zero)


Suppose we have a string s. We want to find the length of the longest substring of s such that every character in the substring appears even number of times (possible zero).
WC Time: O(nlgn). WC space: O(n)

First, it's obvious that the substring must be of an even length. Second, I'm familiar with the sliding window method where we anchor some right index and look for the left-most index to match your criterion. I tried to apply this idea here but couldn't really formulate it.

Also, it seems to me like a priority queue could come in handy (since the O(nlgn) requirement is sort of hinting it)

I'd be glad for help!


Solution

  • Let's define the following bitsets:

    B[c,i] = 1 if character c appeared in s[0,...,i] even number of times.
    

    Calculating B[c,i] takes linear time (for all values):

    for all c:
      B[c,-1] = 0
    for i from 0 to len(arr):
      B[c, i] = B[s[i], i-1] XOR 1
    

    Since the alphabet is of constant size, so are the bitsets (for each i).

    Note that the condition:

    every character in the substring appears even number of times

    is true for substring s[i,j] if and only if the bitset of index i is identical to the bitset of index j (otherwise, there is a bit that repeated odd number of times in this substring ; other direction: If there is a bit that repeated number of times, then its bit cannot be identical).

    So, if we store all bitsets in some set (hash set/tree set), and keep only latest entry, this preprocessing takes O(n) or O(nlogn) time (depending on hash/tree set).

    In a second iteration, for each index, find the farther away index with identical bitset (O(1)/O(logn), depending if hash/tree set), find the substring length, and mark it as candidate. At the end, take the longest candidate.

    This solution is O(n) space for the bitsets, and O(n)/O(nlogn) time, depending if using hash/tree solution.


    Pseudo code:

    def NextBitset(B, c): # O(1) time
      for each x in alphabet \ {c}:
        B[x, i] = B[x, i-1]
       B[c, i] = B[c, i-1] XOR 1
    
    for each c in alphabet: # O(1) time
      B[c,-1] = 0
    map = new hash/tree map (bitset->int)
    
    # first pass: # O(n)/O(nlogn) time
    for i from 0 to len(s):
       # Note, we override with the latest element.
       B = NextBitset(B, s[i])
       map[B] = i
    
    for each c in alphabet: # O(1) time
      B[c,-1] = 0
    max_distance = 0
    # second pass: O(n)/ O(nlogn) time.
    for i from 0 to len(s):
      B = NextBitset(B, s[i])
      j = map.find(B)  # O(1) / O(logn)
      max_distance = max(max_distance, j-i)