Search code examples
pythonalgorithmasymptotic-complexityspace-complexity

Why is the following algorithm O(1) space?


Input: A list of positive integers where one entry occurs exactly once, and all other entries occur exactly twice (for example [1,3,2,5,3,4,1,2,4])

Output: The unique entry (5 in the above example)

The following algorithm is supposed to be O(m) time and O(1) space where m is the size of the list.

def get_unique(intlist):
    unique_val = 0
    for int in intlist:
        unique_val ^= int
    return unique_val

My analysis: Given a list of length m there will be (m + 1)/2 unique positive integers in the input list, so that the smallest possible maximum integer in the list will be (m+1)/2. If we assume this best case, then when taking an XOR sum the variable unique_val will require ceiling(log((m+1)/2)) bits in memory, so I thought the space complexity should be at least O(log(m)).


Solution

  • Your analysis is certainly one correct answer, particularly in a language like Python which gracefully handles arbitrarily large numbers.

    It's important to be clear about what you're trying to measure when thinking about space and time complexity. A reasonable assumption might be that the size of an integer is constant (e.g. you're using 64-bit integers). In that case, the space complexity is certainly O(1), but the time complexity is still O(m).

    Now, you could also argue that using a fixed-size integer means you have a constant upper-bound on the size of m, so perhaps the time complexity is also O(1). But in most cases where you need to analyze the running time of this sort of algorithm, you're probably very interested in the difference between a list of length 10 and one of length 1 billion.

    I'd say it's important to clarify and state your assumptions when analyzing space- and time-complexity. In this case, I would assume we have a fixed size integer and a value of m much smaller than the maximum integer value. In that case, O(1) space and O(m) time are probably the best answers.

    EDIT (based on discussion in other answers)

    Since all m gives you is a lower-bound no the maximum value in the list, you really can't provide a worst-case estimate of the space. I.e. a number in the list can be arbitrarily large. To have any reasonable answer as to the space complexity of this algorithm, you need to make some assumption about the maximum size of the input values.