algorithm time-complexity xor space-complexity

Find the k non-repeating elements in a list with "little" additional space

The original problem statement is this one:

Given an array of 32bit unsigned integers in which every number appears exactly twice except three of them (which appear exactly once), find those three numbers in O(n) time using O(1) extra space. The input array is read-only. What if there are k exceptions instead of 3?

It's easy to solve this in Ο(1) time and Ο(1) space if you accept a very high constant factor because of the input restriction (the array can have at most 2³³ entries):

for i in lst:
    if sum(1 for j in lst if i == j) == 1:
        print i

So, for the sake of this question, let's drop the restriction in bit length and concentrate on the more general problem where the numbers can have up to m bits.

Generalizing an algorithm for k = 2, what I had in mind is the following:

XOR those numbers with a least significant bit of 1 and those with a 0 separately. If for both of the partitions, the resulting value is not zero, we know that we have partitioned the non-repeating numbers into two groups, each of which has at least one member
For each of those groups, try to partition it further by examining the second-least significant bit and so on

There is a special case to be considered, though. If after partitioning a group, the XOR values of one of the groups are both zero, we don't know whether one of the resulting sub-groups is empty or not. In this case my algorithm just leaves this bit out and continues with the next one, which is incorrect, for example it fails for the input [0,1,2,3,4,5,6].

Now the idea I had was to compute not only the XOR of the element, but also the XOR of the values after applying a certain function (I had chosen f(x) = 3x + 1 here). See Evgeny's answer below for a counter-example for this additional check.

Now although the below algorithm is not correct for k >= 7, I still include the implementation here to give you an idea:

def xor(seq):
  return reduce(lambda x, y: x ^ y, seq, 0)

def compute_xors(ary, mask, bits):
  a = xor(i for i in ary if i & mask == bits)
  b = xor(i * 3 + 1 for i in ary if i & mask == bits)
  return a if max(a, b) > 0 else None

def solve(ary, high = 0, mask = 0, bits = 0, old_xor = 0):
  for h in xrange(high, 32):
    hibit = 1 << h
    m = mask | hibit
    # partition the array into two groups
    x = compute_xors(ary, m, bits | hibit)
    y = compute_xors(ary, m, bits)
    if x is None or y is None:
      # at this point, we can't be sure if both groups are non-empty,
      # so we check the next bit
      continue
    mask |= hibit
    # we recurse if we are absolutely sure that we can find at least one
    # new value in both branches. This means that the number of recursions
    # is linear in k, rather then exponential.
    solve(ary, h + 1, mask, bits | hibit, x)
    solve(ary, h + 1, mask, bits, y)
    break
  else:
    # we couldn't find a partitioning bit, so we output (but 
    # this might be incorrect, see above!)
    print old_xor

# expects input of the form "10 1 1 2 3 4 2 5 6 7 10"
ary = map(int, raw_input().split())
solve(ary, old_xor=xor(ary))

From my analysis, this code has a worst-case time complexity of O(k * m² * n) where n is the number of input elements (XORing is O(m) and at most k partitioning operations can be successful) and space complexity O(m²) (because m is the maximum recursion depth and the temporary numbers can be of length m).

The question is of course if there is a correct, efficient approach with good asymptotic runtime (let's assume that k << n and m << n here for the sake of completeness), which also needs little additional space (for example, approaches that sort the input will not be accepted, because we'd need at least O(n) additional space for that, as we can't modify the input!).

EDIT: Now that the algorithm above is proven to be incorrect, it would of course be nice to see how it could be made correct, possibly by making it a bit less effient. Space complexity should be in o(n*m) (that is, sublinear in the total number of input bits). It would be okay to take k as an additional input if that makes the task easier.

Solution

One probabilistic approach to take would be to use a counting filter.

The algorithm is as follows:

Linearly scan the array and 'update' the counting filter.
Linearly scan the array and create a collection of all elements which aren't certainly of count 2 in the filter, this will be <= k of the real solutions. (The false positives in this case are unique elements which look like they aren't).
Chose a new basis of hash functions and repeat until we have all k solutions.

This uses 2m bits of space (independant of n). The time complexity is more involved, but knowing that the probability that any given unique element is not found in step 2 is approx (1 - e^(-kn/m))^k we will resolve to a solution very quickly, but unfortunatly we are not quite linear in n.

I appreciate that this doesn't satisfy your constraints as it is super-linear in time, and is probabilistic, but given the original conditions may not be satisfiable this approach may be worth considering.