Suppose we have a string
s
. We want to find the length of the longest substring ofs
such that every character in the substring appears even number of times (possible zero).
WC Time:O(nlgn)
. WC space:O(n)
First, it's obvious that the substring must be of an even length. Second, I'm familiar with the sliding window method where we anchor some right
index and look for the left-most index to match your criterion. I tried to apply this idea here but couldn't really formulate it.
Also, it seems to me like a priority queue could come in handy (since the O(nlgn)
requirement is sort of hinting it)
I'd be glad for help!
Let's define the following bitsets:
B[c,i] = 1 if character c appeared in s[0,...,i] even number of times.
Calculating B[c,i]
takes linear time (for all values):
for all c:
B[c,-1] = 0
for i from 0 to len(arr):
B[c, i] = B[s[i], i-1] XOR 1
Since the alphabet is of constant size, so are the bitsets (for each i
).
Note that the condition:
every character in the substring appears even number of times
is true for substring s[i,j]
if and only if the bitset of index i
is identical to the bitset of index j
(otherwise, there is a bit that repeated odd number of times in this substring ; other direction: If there is a bit that repeated number of times, then its bit cannot be identical).
So, if we store all bitsets in some set (hash set/tree set), and keep only latest entry, this preprocessing takes O(n)
or O(nlogn)
time (depending on hash/tree set).
In a second iteration, for each index, find the farther away index with identical bitset (O(1)/O(logn)
, depending if hash/tree set), find the substring length, and mark it as candidate. At the end, take the longest candidate.
This solution is O(n)
space for the bitsets, and O(n)/O(nlogn)
time, depending if using hash/tree solution.
Pseudo code:
def NextBitset(B, c): # O(1) time
for each x in alphabet \ {c}:
B[x, i] = B[x, i-1]
B[c, i] = B[c, i-1] XOR 1
for each c in alphabet: # O(1) time
B[c,-1] = 0
map = new hash/tree map (bitset->int)
# first pass: # O(n)/O(nlogn) time
for i from 0 to len(s):
# Note, we override with the latest element.
B = NextBitset(B, s[i])
map[B] = i
for each c in alphabet: # O(1) time
B[c,-1] = 0
max_distance = 0
# second pass: O(n)/ O(nlogn) time.
for i from 0 to len(s):
B = NextBitset(B, s[i])
j = map.find(B) # O(1) / O(logn)
max_distance = max(max_distance, j-i)