Uniform sampling of k integers from [0:n)

My goal is to sample k integers from 0, ... n-1 without duplication. The order of sampled integers doesn't matter. At every each call (which occurs very often), n and k will slightly vary but not much (n is about 250,000 and k is about 2,000). I've come up with the following amortized O(k) algorithm:

Prepare an array A with items 0, 1, 2, ... , n-1. This takes O(n) but since n is relatively stable, the cost can be made amortized constant.
Sample a random number r from [0:i] where i = n - 1. Here the cost is in fact related to n, but as n is not VERY BIG, this dependency is not critical.
Swap the rth item and the ith item in the array A.
Decrease i by 1.
Repeat k times the steps 2~4; now we have a random permutation of length k at the tail of A. Copy this.
We should roll back A to its initial state (0, ... , n-1) to keep the cost of the step 1 constant. This can be done by push r to a stack of length k at each pass of step 2. Preparation of the stack requires amortized constant cost.

I think uniform sampling of permutation/combination should be an exhaustively studied problem, so either (1) there is a much better solution, or at least (2) my solution is a (minor modification of) a well-known solution. Thus,

In case (1), I want to know that better solution.
In case (2), I want to find a reference.

Please help me. Thanks.

Solution

If k is much less than n -- say, less than half of n -- then the most efficient solution is to keep the numbers generated in a hash table (actually, a hash set, since there is no value associated with a key). If the random number happens to already be in the hash table, reject it and generate another one in its place. With the actual values of k and n suggested (k ∼ 2000; n ∼ 250,000) the expected number of rejections to generate k unique samples is less than 10, so it will hardly be noticeable. The size of the hash table is O(k), and it can simply be deleted at the end of the sample generation.
It is also possible to simulate the FYK shuffle algorithm using a hash table instead of a vector of n values, thereby avoiding having to reject generated random numbers. If you were using a vector A, you would start by initializing A[i] to i, for every 0 ≤ i < k. With the hash table H, you start with an empty hash table, and use the convention that H[i] is considered to be i if the key i is not in the hash table. Step 3 in your algorithm -- "swap A[r] with A[i]" -- becomes "add H[r] as the next element of the sample and set H[r] to H[i]". Note that it is unnecessary to set H[i] because that element will never be referred to again: all subsequent random numbers r are generate from a range which does not include i.

Because the hash table in this case contains both keys and values, it is larger than the hash set used in alternative 1, above, and the increased size (and consequent increase in memory cache misses) is likely to cause more overhead than is saved by eliminating rejections. However, it has the advantage of working even if k is occasionally close to n.
Finally, in your proposed algorithm, it is actually quite easy to restore A in O(k) time. A value A[j] will have been modified by the algorithm only if:

a. n − k ≤ j < n, or

b. there is some i such that n − k ≤ i < n and A[i] = j.

Consequently, you can restore the vector A by looking at each A[i] for n − k ≤ i < n: first, if A[i] < n−k, set A[A[i]] to A[i]; then, unconditionally set A[i] to i.