Lower bound of space usage for finding median from stream of 4-bit integers. Why is it 'log n' bits?

Today, during the class (algorithm class), professor said that the lower bound of space usage (in bits) for finding median from stream of n 4-bit integers is log n. Any idea why this is true?

Solution

Intuitively, Θ(log n) bits is enough to write out how many times you've seen each of the 16 possible 4-bit values, and from there you could compute the median. The intuition behind why you can't (asymptotically) improve upon this is the idea that if you use fewer bits, you can't even remember how many times you've seen each of the numbers, so you can't always return the median.

The gist of the formal argument I'm about to make here is the following. Imagine I stream the first half of my input into your algorithm. If you don't have enough bits of memory, you can't uniquely remember what that input was. And if you can't remember what that input was, then I can force your algorithm to give the wrong answer by maliciously choosing the back half of the sequence of numbers.

To formalize this, let's suppose that you have an algorithm that claims to solve this problem using o(log n) (that's little-o of log n, by the way) bits of memory. Now, suppose I have a "sufficiently large" stream of n = 2k + 1 numbers, each that's four bits long. Since you're using o(log n) bits of memory and I've picked n to be "sufficiently large," we can say that your algorithm uses strictly fewer than, say, log (n - 1) - 1 = log (2k) - 1 = 1 + log k - 1 = log k bits of memory.

Now, consider the following k choices for a sequence of 4-bit numbers to stream through your algorithm. The first one is k copies of 0000. The second one is k-1 copies of 0000 followed by 1 copy of 1111. The third one is k-2 copies of 0000 followed by 2 copies of 1111. And more generally, there's one sequence for each of the k+1 different choices of some number of copies of 0000 and some number of copies of 1111.

Now, run each of these k+1 possible options through your algorithm. You are using strictly fewer than log k bits of memory, and so there are fewer than 2^{log k} = k possible combinations that those bits of memory can be in. And that's a problem, because I have k+1 different sequences. Therefore, there must be two of those sequences that, when run through your algorithm, cause the memory of the algorithm to end up in the same state. Let's suppose that the first one has s copies of 0000, and the second has t copies of 0000, with s < t.

Notice that we've only fed k elements of the stream into your algorithm, so we still have k+1 remaining elements to pick. And what if I choose them so that there are exactly k-s copies of 0000, with the rest of the s + 1 elements being 1111? Well, in that case, look what happens.

Take the original sequence of s copies of 0000 followed by k-s copies of 1111 and run it through the algorithm. Then give it k-s copies of 0000 and s+1 copies of 1111. The overall stream now has k copies of 0000 and k+1 copies of 1111, which means that the median is 1111.
Take the original sequence of t > s copies of 0000 followed by k-t < k-s copies of 1111 and run it through the algorithm. Then give it k-s copies of 0000 and s+1 copies of 1111. The overall stream now has at least k+1 copies of 0000 and at most k copies of 1111, so the median should be 0000.

But here we run into a problem. The state of the algorithm is identical after seeing the first half of the input, and we've fed the same sequence in as the back half of the input, so the algorithm should behave the same way in the two above cases. But it can't, because as we saw here the outputs are supposed to be different. This is impossible for any deterministic algorithm!

This style of argument, by the way, is based on the idea of a fooling set, which is a set of inputs such that any two inputs have at least one suffix that can be tacked on that distinguishes those two inputs. It's related to the Myhill-Nerode theorem for regular languages, and you can think of any deterministic streaming algorithm with bounded memory as a DFA with one state per combination of bits of memory.