Search code examples
c#randomparallel-processingthread-local-storage

Create ThreadLocal random generators with known seeds


I'm struggling to find a way to have a single random number generator per thread, while at the same time making sure that when the program is re-run, the same numbers are produced.

What I do now is something like this:

class Program {
    static void Main(string[] args) {

        var seed = 10;
        var data = new List<double>();
        var dataGenerator = new Random(seed);

        for (int i = 0; i < 10000; i++) {
            data.Add(dataGenerator.NextDouble());
        }

        var results = new ConcurrentBag<double>();

        Parallel.ForEach(data, (d) => {
            var result = Calculate(d, new Random(d.GetHashCode()); 
            results.Add(result);
        });

    }

    static double Calculate(double x, Random random) {
        return x * random.NextDouble();
    }
}

Because the randomgenerator that creates the 'data' list is provided a seed and the randomgenerators that are used in the calculation are provided a seed based on the hashcode of the number being processed, the results are repeatable. Regardless the number of threads and the order in which they are instantiated.

I'm wondering if it's possible to instantiate just a single randomgenerator for each thread. The following following piece of code seems to accomplish that, but because the random generators are not provided with a (reproducible) seed anymore, the results are not repeatable.

class Program {
    static void Main(string[] args) {

        var seed = 10;
        var data = new List<double>();
        var dataGenerator = new Random(seed);

        for (int i = 0; i < 10000; i++) {
            data.Add(dataGenerator.NextDouble());
        }

        var results = new ConcurrentBag<double>();

        var localRandom = new ThreadLocal<Random>(() => new Random());

        Parallel.ForEach(data, (d) => {
            var result = Calculate(d, localRandom.Value); 
            results.Add(result);
        });

    }

    static double Calculate(double x, Random random) {
        return x * random.NextDouble();
    }
}

Can anyone think of a nice solution to this problem?


Solution

  • It's possible, indeed you very nearly do it correctly in your question, but the problem is that that isn't quite what you want.

    If you seeded your thread-local Random with the same number each time, you would make the results deterministic within that thread, related to the number of previous operations. What you want is a pseudo-random number that is deterministic relative to the input.

    Well, you could just stick with Random(). It's not that heavy.

    Alternatively, you can have your own pseudo-random algorithm. Here's a simple example based on a re-hashing algorithm (intended to distribute the bits of hashcodes even better):

    private static double Calculate(double x)
    {
      unchecked
      {
        uint h = (uint)x.GetHashCode();
        h += (h << 15) ^ 0xffffcd7d;
        h ^= (h >> 10);
        h += (h << 3);
        h ^= (h >> 6);
        h += (h << 2) + (h << 14);
        return (h ^ (h >> 16)) / (double)uint.MaxValue * x;
      }
    }
    

    This isn't a particularly good pseudo-random generator, but it's pretty fast. It also does no allocation and leads to no garbage collection.

    There-in lies the trade-off of this entire approach; you can simplify the above and be even faster but less "random" or you can be more "random" for more effort. I'm sure there's code out there that is both faster and more "random" than the above, which is more to demonstrate the approach than anything else, but among the rival algorithms you're looking at a trade-off of the quality of the generated number versus the performance. new Random(d).NextDouble() is at a particular point in that trade-off, other approaches are at other points.

    Edit: The re-hashing algorithm I used is a Wang/Jenkins hash. I couldn't remember the name when I wrote it.

    Edit: Having a better idea of your requirements from the comments, I'd now say that...

    You want to create a PRNG class, it could use the algorithm above, that of System.Random (taking reflected code as a starting point), the 128bitXorShift algorithm you mention or whatever. The important difference is that it must have a Reseed method. For example, if you copied System.Random's approach, your reseed would look like most of the constructor's body (indeed, you'd probably refactor so that apart from maybe creating the array it uses, the constructor would call into reseed).

    Then you'd create an instance per thread, and call .Reseed(d.GetHashCode()) at the point where you'd create a new Random in your existing code.

    Note also that this gives you another advantage, which is that if you depend upon consistent results from your PRNG (which it seems you do), then the fact that you are not promised a consistent algorithm in System.Random between framework versions (perhaps even including patches and security fixes) is a bad point for you, and this approach adds consistency.

    However, you are also not promised a consistent algorithm to double.GetHashCode(). I'd doubt they'd change it (unlike string.GetHashCode(), which is often changed), but just in case you could make your Reseed() take a double do something like:

    private static unsafe int GetSeedInteger(double d)
    {
      if(d == 0.0)
        return 0;
      long num = *((long*)&d);
      return ((int)num) ^ (int)(num >> 32);
    }
    

    Which pretty much just copies the current double.GetHashCode(), but now you'll be consistent in the face of framework changes.

    It might be worth considering breaking the set of tasks into chunks yourself, creating threads for each chunk, and then just creating this object as a local in the per-chunk method.

    Pros:

    Accessing ThreadLocal<T> is more expensive than accessing a local T.

    If the tasks are consistent in relative time to execute, you don't need a lot of Parallel.ForEach's cleverness.

    Cons:

    Parallel.ForEach is really good at balancing things out. What you're doing has to be very naturally balanced, or saving a lot on a pre-chunk basis, before eschewing its use gains you anything.