I'm trying to write an RNG that also returns the value of the updated seed. The perhaps obvious reason for this is so that new random variables can be added to the program later, without changing the values of the existing RNGs. For a python/numpy version of this issue see for example: Difference between RandomState and seed in numpy
Here's a example of usage with a (tentative) proposed solution:
program main
! note that I am assuming size=12 for the random
! seed but this is implementation specific and
! could be different in your implementation
! (you can query this with random_seed(size) btw)
integer :: iseed1(12) = 123456789
integer :: iseed2(12) = 987654321
do i = 1, 2
x = ran(iseed1)
y = ran(iseed2)
end do
end program main
function ran( seed )
integer, intent(inout) :: seed(12)
real :: ran
call random_seed( put=seed )
call random_number( ran )
call random_seed( get=seed )
end function ran
Note that a vital aspect of this solution (and any other solutions) is that if we added seed3
and x3
to the above, there would be no change to the realized values of x1
and x2
. Similarly, either x1
or x2
could be deleted from the code without affecting the values of the other one.
If it helps, this how the ran()
(extension) function works on Compaq/HP and Intel compilers and I'm basically trying to implement that same API on gfortran. Note, however, that the seed for that function is a scalar whereas it is a one-dimensional array using the Fortran 90 subroutine random_seed
(with the size/length of the array being implementation specific).
I am providing my current solution as a benchmark error but hope others can either critique that answer or provide better ones.
I would appreciate any analysis of the benchmark results according to standard PRNG tests and in particular, analysis of how the seed is set. In the benchmark I am merely using a scalar broadcast to an array which is very simple and concise and would like to avoid explicitly providing an entire array of integers.
So I would like either a somewhat rigorous confirmation that this is fine or else a more concise method of setting the seed (in a repeatable way!) than writing out an entire array of, say, 12 or 33 integers. E.g. maybe there is some simple and concise way to produce a nice stream of 12 pseudo-random integers from a single integer (as the array length is so short, this is probably easy?).
Edit to add: Followup question about setting seeds in fortran: Correctly setting random seeds for repeatability
Your proposed solution looks to me like it should work - you are recording the entire state of the generator (via get
), and swapping between streams when necessary (via put
). Note that I haven't actually tested your code, though.
This answer came about because a previous answer (now deleted) saved only the first element of the state array, and was using it to set the entire state array. It looked something like:
integer :: seed_scalar, state_array(12)
...
! -- Generate a random number from a thread that is stored as seed_scalar
state_array(:) = seed_scalar
call random_seed( put=state_array )
call random_number( ran )
call random_seed( get=state_array )
seed_scalar = state_array(1)
! -- discard state_array, store seed_scalar
This answer is intended to demonstrate that, for some compilers (gfortran 4.8 and 8.1, pgfortran 15.10), this method of maintaining separate threads via only a scalar results in bad RNG behavior.
Consider the following code to primitively test a random number generator. Many random numbers are generated - 100M for this example - and performance is monitored in two ways. First, counts for when the random number increases or decreases are recorded. Second, a histogram with bin width 0.01 is generated. (This is obviously a primitive test case, but it turns out to be sufficient to prove the point.). Finally, the estimated one-sigma standard deviation for all probabilities is also generated. This allows us to determine when variations are random or statistically significant.
program main
implicit none
integer, parameter :: wp = selected_real_kind(15,307)
integer :: N_iterations, state_size, N_histogram
integer :: count_incr, count_decr, i, loc, fid
integer, allocatable, dimension(:) :: state1, histogram
real(wp) :: ran, oldran, p, dp, re, rt
! -- Some parameters
N_iterations = 100000000
N_histogram = 100
call random_seed(size = state_size)
allocate(state1(state_size), histogram(N_histogram))
write(*,'(a,i3)') '-- State size is: ', state_size
! -- Initialize RNG, taken as today's date (also tested with other initial seeds)
state1(:) = 20180815
call random_seed(put=state1)
count_incr = 0
count_decr = 0
histogram = 0
ran = 0.5_wp ! -- To yield proprer oldran
! -- Loop to grab a batch of random numbers
do i=1,N_iterations
oldran = ran
! -- This preprocessor block modifies the RNG state in a manner
! -- consistent with storing only the first value of it
#ifdef ALTER_STATE
call random_seed(get=state1)
state1(:) = state1(1)
call random_seed(put=state1)
#endif
! -- Get the random number
call random_number(ran)
! -- Process Histogram
loc = CEILING(ran*N_histogram)
histogram(loc) = histogram(loc) + 1
if (oldran<ran) then
count_incr = count_incr + 1
elseif (oldran>ran) then
count_decr = count_decr + 1
else
! -- This should be statistically impossible
write(*,*) '** Error, same value?', ran, oldran
stop 1
endif
enddo
! -- For this processing, we have:
! re Number of times this event occurred
! rt Total number
! -- Probability of event is just re/rt
! -- One-sigma standard deviation is sqrt( re*(rt-re)/rt**3 )
! -- Write histogram
! -- For each bin, compute probability and uncertainty in that probability
open(newunit=fid, file='histogram.dat', action='write', status='replace')
write(fid,'(a)') '# i, P(i), dP(i)'
rt = real(N_iterations,wp)
do i=1,N_histogram
re = real(histogram(i),wp)
p = re/rt
dp = sqrt(re*(rt-1)/rt**3)
write(fid,'(i4,2es26.18)') i, p, dp
enddo
close(fid)
! -- Probability of increase and decrease
re = real(count_incr,wp)
p = re/rt
dp = sqrt(re*(rt-1)/rt**3)
write(*,'(a,f20.15)') 'Probability of increasing: ', p
write(*,'(a,f20.15)') ' one-sigma deviation: ', dp
re = real(count_decr,wp)
p = re/rt
dp = sqrt(re*(rt-1)/rt**3)
write(*,'(a,f20.15)') 'Probability of decreasing: ', p
write(*,'(a,f20.15)') ' one-sigma deviation: ', dp
write(*,'(a)') 'complete'
end program main
Without the preprocessor directive ALTER_STATE
, we are using gfortran
's built-in PRNG as intended, and the results are as expected:
enet-mach5% gfortran --version
GNU Fortran (SUSE Linux) 4.8.3 20140627 [gcc-4_8-branch revision 212064]
Copyright (C) 2013 Free Software Foundation, Inc.
GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING
enet-mach5% gfortran -cpp -fcheck=all main.f90 && time ./a.out
-- State size is: 12
Probability of increasing: 0.499970710000000
one-sigma deviation: 0.000070708606619
Probability of decreasing: 0.500029290000000
one-sigma deviation: 0.000070712748851
complete
real 0m2.414s
user 0m2.408s
sys 0m0.004s
The expected probability for increasing/decreasing is 0.5, and both are with the estimated uncertainty (0.49997 is less than 0.00007 from 0.5). The histogram, plotted with error bars, is
For each bin, the variation from the expected probability (0.01) is small, and typically within the estimated uncertainty. Because we have generated many numbers, all variations are small (order 0.1%). Basically, this test hasn't found any suspicious behavior.
If we enable the block inside ALTER_STATE
, we are modifying the internal state of the random number generator each time a number is generated. This is intended to mimic a now-deleted solution, which stored only the first value of the state. The results are:
enet-mach5% gfortran -cpp -DALTER_STATE -fcheck=all main.f90 && time ./a.out
-- State size is: 12
Probability of increasing: 0.501831930000000
one-sigma deviation: 0.000070840096343
Probability of decreasing: 0.498168070000000
one-sigma deviation: 0.000070581021884
complete
real 0m16.489s
user 0m16.492s
sys 0m0.000s
The observed probability of increasing is far outside the expected variation (26 sigma!). This already indicates something is wrong. The histogram is:
Note that the range in y has changed substantially. Here, we have a variation of approximately two orders of magnitude larger than the previous case, far outside of the expected variation. The error bars are hard to see, here, because the y-range is so much larger. If my random number generator performed like this, I wouldn't feel comfortable using it for anything, not even a coin flip.
The put
and get
options of random_seed
access the processor-dependent internal state of the random number generator. This typically has more entropy than a single number, and its form is processor-dependent. There's no guarantee that the first number is at all representative of the entire state.
If you're initializing a random seed once, and generating many times, using a single scalar is fine. However, it's clearly necessary to store more than a single number if you intend to use that state to generate every single number.
I'm frankly a little bit surprised this primitive test was able to demonstrate the bad behavior. Validity of RNG is a complex subject, and I am by no means an expert. The results were also compiler-dependent:
It's possible that a larger state size results in more entropy loss when saving only a single value, so it's correlated with poorer behavior. That certainly fits with my observations. Without knowing the internals of each generator, though, it's impossible to tell.