Search code examples
fortrangfortranfortran90intel-fortran

Random number generator (RNG/PRNG) that returns updated value of seed


I'm trying to write an RNG that also returns the value of the updated seed. The perhaps obvious reason for this is so that new random variables can be added to the program later, without changing the values of the existing RNGs. For a python/numpy version of this issue see for example: Difference between RandomState and seed in numpy

Here's a example of usage with a (tentative) proposed solution:

program main

   ! note that I am assuming size=12 for the random 
   ! seed but this is implementation specific and
   ! could be different in your implementation
   ! (you can query this with random_seed(size) btw)

   integer :: iseed1(12) = 123456789
   integer :: iseed2(12) = 987654321

   do i = 1, 2
      x = ran(iseed1)
      y = ran(iseed2)
   end do

end program main

function ran( seed )

   integer, intent(inout) :: seed(12)
   real                   :: ran

   call random_seed( put=seed )
   call random_number( ran )
   call random_seed( get=seed )

end function ran

Note that a vital aspect of this solution (and any other solutions) is that if we added seed3 and x3 to the above, there would be no change to the realized values of x1 and x2. Similarly, either x1 or x2 could be deleted from the code without affecting the values of the other one.

If it helps, this how the ran() (extension) function works on Compaq/HP and Intel compilers and I'm basically trying to implement that same API on gfortran. Note, however, that the seed for that function is a scalar whereas it is a one-dimensional array using the Fortran 90 subroutine random_seed (with the size/length of the array being implementation specific).

I am providing my current solution as a benchmark error but hope others can either critique that answer or provide better ones.

I would appreciate any analysis of the benchmark results according to standard PRNG tests and in particular, analysis of how the seed is set. In the benchmark I am merely using a scalar broadcast to an array which is very simple and concise and would like to avoid explicitly providing an entire array of integers.

So I would like either a somewhat rigorous confirmation that this is fine or else a more concise method of setting the seed (in a repeatable way!) than writing out an entire array of, say, 12 or 33 integers. E.g. maybe there is some simple and concise way to produce a nice stream of 12 pseudo-random integers from a single integer (as the array length is so short, this is probably easy?).

Edit to add: Followup question about setting seeds in fortran: Correctly setting random seeds for repeatability


Solution

  • Your proposed solution looks to me like it should work - you are recording the entire state of the generator (via get), and swapping between streams when necessary (via put). Note that I haven't actually tested your code, though.

    This answer came about because a previous answer (now deleted) saved only the first element of the state array, and was using it to set the entire state array. It looked something like:

    integer :: seed_scalar, state_array(12)
    ...
    ! -- Generate a random number from a thread that is stored as seed_scalar
    state_array(:) = seed_scalar
    call random_seed( put=state_array )
    call random_number( ran )
    call random_seed( get=state_array )
    seed_scalar = state_array(1)
    ! -- discard state_array, store seed_scalar
    

    This answer is intended to demonstrate that, for some compilers (gfortran 4.8 and 8.1, pgfortran 15.10), this method of maintaining separate threads via only a scalar results in bad RNG behavior.

    Consider the following code to primitively test a random number generator. Many random numbers are generated - 100M for this example - and performance is monitored in two ways. First, counts for when the random number increases or decreases are recorded. Second, a histogram with bin width 0.01 is generated. (This is obviously a primitive test case, but it turns out to be sufficient to prove the point.). Finally, the estimated one-sigma standard deviation for all probabilities is also generated. This allows us to determine when variations are random or statistically significant.

    program main
       implicit none
       integer, parameter :: wp = selected_real_kind(15,307)
       integer :: N_iterations, state_size, N_histogram
       integer :: count_incr, count_decr, i, loc, fid
       integer, allocatable, dimension(:) :: state1, histogram
       real(wp) :: ran, oldran, p, dp, re, rt
    
       ! -- Some parameters
       N_iterations = 100000000
       N_histogram = 100
    
       call random_seed(size = state_size)
       allocate(state1(state_size), histogram(N_histogram))
       write(*,'(a,i3)') '-- State size is: ', state_size
    
       ! -- Initialize RNG, taken as today's date (also tested with other initial seeds)
       state1(:) = 20180815
       call random_seed(put=state1)
    
       count_incr = 0
       count_decr = 0
       histogram = 0
       ran = 0.5_wp      ! -- To yield proprer oldran
    
       ! -- Loop to grab a batch of random numbers
       do i=1,N_iterations
          oldran = ran
    
          ! -- This preprocessor block modifies the RNG state in a manner
          ! -- consistent with storing only the first value of it
    #ifdef ALTER_STATE
          call random_seed(get=state1)
          state1(:) = state1(1)
          call random_seed(put=state1)
    #endif
    
          ! -- Get the random number
          call random_number(ran)
    
          ! -- Process Histogram
          loc = CEILING(ran*N_histogram)
          histogram(loc) = histogram(loc) + 1
    
          if (oldran<ran) then
             count_incr = count_incr + 1
          elseif (oldran>ran) then
             count_decr = count_decr + 1
          else
             ! -- This should be statistically impossible
             write(*,*) '** Error, same value?', ran, oldran
             stop 1
          endif
       enddo
    
       ! -- For this processing, we have:
       !     re    Number of times this event occurred
       !     rt    Total number
       ! -- Probability of event is just re/rt
       ! -- One-sigma standard deviation is sqrt( re*(rt-re)/rt**3 )
    
       ! -- Write histogram
       ! -- For each bin, compute probability and uncertainty in that probability
       open(newunit=fid, file='histogram.dat', action='write', status='replace')
       write(fid,'(a)') '# i, P(i), dP(i)'
    
       rt = real(N_iterations,wp)
       do i=1,N_histogram
          re = real(histogram(i),wp)
          p = re/rt
          dp = sqrt(re*(rt-1)/rt**3)
    
          write(fid,'(i4,2es26.18)') i, p, dp
       enddo
       close(fid)
    
       ! -- Probability of increase and decrease
       re = real(count_incr,wp)
       p = re/rt
       dp = sqrt(re*(rt-1)/rt**3)
       write(*,'(a,f20.15)') 'Probability of increasing: ', p
       write(*,'(a,f20.15)') '      one-sigma deviation: ', dp
    
       re = real(count_decr,wp)
       p = re/rt
       dp = sqrt(re*(rt-1)/rt**3)
       write(*,'(a,f20.15)') 'Probability of decreasing: ', p
       write(*,'(a,f20.15)') '      one-sigma deviation: ', dp
    
       write(*,'(a)') 'complete'
    
    end program main
    

    Without Modification

    Without the preprocessor directive ALTER_STATE, we are using gfortran's built-in PRNG as intended, and the results are as expected:

    enet-mach5% gfortran --version
    GNU Fortran (SUSE Linux) 4.8.3 20140627 [gcc-4_8-branch revision 212064]
    Copyright (C) 2013 Free Software Foundation, Inc.
    
    GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
    You may redistribute copies of GNU Fortran
    under the terms of the GNU General Public License.
    For more information about these matters, see the file named COPYING
    
    enet-mach5% gfortran -cpp -fcheck=all main.f90 && time ./a.out
    -- State size is:  12
    Probability of increasing:    0.499970710000000
          one-sigma deviation:    0.000070708606619
    Probability of decreasing:    0.500029290000000
          one-sigma deviation:    0.000070712748851
    complete
    
    real    0m2.414s
    user    0m2.408s
    sys     0m0.004s
    

    The expected probability for increasing/decreasing is 0.5, and both are with the estimated uncertainty (0.49997 is less than 0.00007 from 0.5). The histogram, plotted with error bars, is

    Histogram for Correct Usage

    For each bin, the variation from the expected probability (0.01) is small, and typically within the estimated uncertainty. Because we have generated many numbers, all variations are small (order 0.1%). Basically, this test hasn't found any suspicious behavior.

    With Modification

    If we enable the block inside ALTER_STATE, we are modifying the internal state of the random number generator each time a number is generated. This is intended to mimic a now-deleted solution, which stored only the first value of the state. The results are:

    enet-mach5% gfortran -cpp -DALTER_STATE -fcheck=all main.f90 && time ./a.out
    -- State size is:  12
    Probability of increasing:    0.501831930000000
          one-sigma deviation:    0.000070840096343
    Probability of decreasing:    0.498168070000000
          one-sigma deviation:    0.000070581021884
    complete
    
    real    0m16.489s
    user    0m16.492s
    sys     0m0.000s
    

    The observed probability of increasing is far outside the expected variation (26 sigma!). This already indicates something is wrong. The histogram is:

    Histogram for Incorrect Usage

    Note that the range in y has changed substantially. Here, we have a variation of approximately two orders of magnitude larger than the previous case, far outside of the expected variation. The error bars are hard to see, here, because the y-range is so much larger. If my random number generator performed like this, I wouldn't feel comfortable using it for anything, not even a coin flip.

    Closing

    The put and get options of random_seed access the processor-dependent internal state of the random number generator. This typically has more entropy than a single number, and its form is processor-dependent. There's no guarantee that the first number is at all representative of the entire state.

    If you're initializing a random seed once, and generating many times, using a single scalar is fine. However, it's clearly necessary to store more than a single number if you intend to use that state to generate every single number.

    I'm frankly a little bit surprised this primitive test was able to demonstrate the bad behavior. Validity of RNG is a complex subject, and I am by no means an expert. The results were also compiler-dependent:

    • The results and histograms shown are for gfortran 4.8, which has a state size of 12.
    • Intel 16.0 uses a state size of 2, and this test didn't show any undesirable behavior.
    • gfortran 8.1.0 has a state size of 33, and PGI 15.10 has a state size of 34. Their behavior was the same, and the worst yet. When setting the entire state to a single value, subsequent random numbers are always identical. These generators require a 'priming' of around 30 generations before the numbers start looking reasonable, when initialized from a a single seed. When saving only a single seed in the state, this priming never occurs.

    It's possible that a larger state size results in more entropy loss when saving only a single value, so it's correlated with poorer behavior. That certainly fits with my observations. Without knowing the internals of each generator, though, it's impossible to tell.