Search code examples
performancecomparisonjuliamontecarlo

Faster vector comparison in Julia


Im trying to construct and compare, the fastest possible way, two 01 random vectors of the same length using Julia, each vector with the same number of zeros and ones.

This is all for a MonteCarlo simulation of the following probabilistic question

We have two independent urns, each one with n white balls and n black balls. Then we take a pair of balls, one of each urn, each time up to empty the urns. What is the probability that each pair have the same color?

What I did is the following:

using Random
# Auxiliar function that compare the parity, element by element, of two 
# random vectors of length 2n
function comp(n::Int64)
  sum((shuffle!(Vector(1:2*n)) .+ shuffle!(Vector(1:2*n))).%2)
end

The above generate two random permutations of the vector from 1 to 2n, add element by element, apply modulo 2 to each elemnt and after sum all the values of the remaining vector. Then Im using above the parity of each number to model it color: odd black and white even.

If the final sum is zero then the two random vectors had the same colors, element by element. A different result says that the two vectors doesnt had paired colors.

Then I setup the following function, that it is just the MonteCarlo simulation of the desired probability:

  # Here m is an optional argument that control the amount of random
  # experiments in the simulation
  function sim(n::Int64,m::Int64=24)
  # A counter for the valid cases
  x = 0
  for i in 1:2^m
    # A random pair of vectors is a valid case if they have the
    # the same parity element by element so
   if comp(n) == 0
    x += 1
   end
  end
  # The estimated value
   x/2^m
  end

Now I want to know if there is a faster way to compare such vectors. I tried the following alternative construction and comparison for the random vectors

shuffle!( repeat([0,1],n)) == shuffle!( repeat([0,1],n))

Then I changed accordingly the code to

comp(n)

With these changes the code runs slightly slower, what I tested with the function @time. Other changes that I did was changing the forstatement for a whilestatement, but the computation time remain the same.

Because Im not programmer (indeed just yesterday I learn something of the Julia language, and installed the Juno front-end) then probably will be a faster way to make the same computations. Some tip will be appreciated because the effectiveness of a MonteCarlo simulation depends on the number of random experiments, so the faster the computation the larger values we can test.


Solution

  • The key cost in this problem is shuffle! therefore in order to maximize the simulation speed you can use (I add it as an answer as it is too long for a comment):

    function test(n,m)
      ref = [isodd(i) for i in 1:2n]
      sum(all(view(shuffle!(ref), 1:n)) for i in 1:m) / m
    end
    

    What are the differences from the code proposed in the other answer:

    1. You do not have to shuffle! both vectors; it is enough to shuffle! one of them, as the result of the comparison is invariant to any identical permutation of both vectors after independently shuffling them; therefore we can assume that one vector is after random permutation reshuffled to be ordered so that it has trues in the first n entries and falses in the last n entries
    2. I do shuffle! in-place (i.e. ref vector is allocated only once)
    3. I use all function on the fist half of the vector; this way the check is stopped as I hit first false; if I hit all true in the first n entries I do not have to check the last n entries as I know they are all false so I do not have to check them