Search code examples
c#algorithmlinqlazy-evaluationshuffle

How to implement lazy shuffling of Lists in C#?


I am looking for an implementation of lazy shuffling in c#.

I only care about the time it takes to process the first couple of elements. I do not care whether or not the original list gets modified (i.e. removing elements would be fine). I do not care if the processing time gets longer as the iterator reaches the end of the list (as long as it stays within reasonable bounds of course).

Context: I have a large list, that I want to get a relatively small number of random samples from. In most cases I only need the very first random element, but in same rare cases I need all elements from the list.

If possible I would like to implement this as an extension method, like this (but answers without extension methods are fine too):

public static class Program
{
    public static IEnumerable<T> lazy_shuffle<T>(this IEnumerable<T> input, Random r)
    {
        //do the magic
        return input;
    }
    static void Main(string[] args)
    {
        var start = DateTime.Now;
        var shuffled = Enumerable.Range(0, 1000000).lazy_shuffle(new Random(123));
        var enumerate = shuffled.GetEnumerator();
        foreach (var i in Enumerable.Range(0, 5))
        {
            enumerate.MoveNext();
            Console.WriteLine(enumerate.Current);
        }
        Console.WriteLine($"time for shuffling 1000000 elements was {(DateTime.Now - start).TotalMilliseconds}ms");
    }
}

Note:

  • input.OrderBy(i => r.Next()) would not be good enough, as it needs to iterate over the entire list once the generate one random number for each element of the list.
  • this is not a duplicate of Lazy Shuffle Algorithms because my question has less tight bounds for the algorithms but instead requires an implementation in c#
  • this is not a duplicate of Randomize a List<T> because that question is about regular shuffling and not lazy shuffling.

update:

  • A Count exists. Random Access to elements exists. It is not strictly an ienumerable, and instead just a big List or Array. I have update the question to say "list" instead of "ienumerable". Only the output of the lazy-shuffler needs to be enumerable, the source can be an actual list.
  • The selection should be fair, i.e. each element needs to have the same chance to be picked first.
  • mutation/modification of the source-list is fine
  • In the end I only need to take N random elements from the list, but I do not know the N beforehand

Solution

  • Since the original list can be modified, here is a very simple and efficient solution, based on this answer:

    public static IEnumerable<T> Shuffle<T>(this IList<T> list, Random rng)
    {
        for(int i = list.Count - 1; i >= 0; i--)
        {
            int swapIndex = rng.Next(i + 1);
            yield return list[swapIndex];
            list[swapIndex] = list[i];
        }
    }