Search code examples
pythonnumpymontecarlo

How to make sure a pair of items in np.random.choice don't appear together in a loop?


I'm currently trying to run a MonteCarlo simulation without replacement, using np.random.choice, but there are certain items in the list which cannot appear together. For example, if I have a list of five items: RO, MI, VE,NA, SI, and each loop produces a group of four items, RO can appear with VE, MI, or NA, but it cannot appear with SI, on each iteration. So a loop like: [RO,MI,VE,NA] is correct But not: [RO,MI,SI,NA] as SI appears in the same group as RO. This is the code I'm currently using (which is producing the incorrect grouping):

np.random.seed(42)
portfolio=[]
for i in range(0,10000):
    port_gar = list(np.random.choice(cities, size=4, replace=False, p=cities_prob))
    portfolio.append(port_gar)
print(portfolio)

I'm at a loss as to what would do and any help would be appreciated. Thanks!


Solution

  • I present two solutions, both of which make use of the random module from the standard library. I recommend this approach over using numpy since you can't really take advantage of numpy here in any way, and especially because you're not even ending up with a numpy array at the end (you're only using numpy for random number generation).

    Solution 1: every sample has exactly one city from the "exclusive cities" group

    Something you could do is have your cities which don't play nice in a separate group from the others:

    import random
    random.seed(42)
    
    
    n = 10  # whatever total number of items you want per sample
    num_samples = 10000  # total number of samples
    
    
    free_cities = [...]  # all cities in this group play nicely together
    exclusive_cities = [...]  # cities in this group cannot be grouped in a sample, so we will only ever choose one city at a time from this group
    
    
    portfolio = []
    for _ in range(num_samples):
        free_sample = random.sample(free_cities, k=(n-1))
        exclusive_sample = random.sample(exclusive_cities, k=1)
        portfolio.append(random.sample(free_sample + exclusive_sample, k=n))
    

    Here's a contrived example:

    n = 6
    num_samples = 10
    
    free_cities = ["AW", "JB", "EX", "HZ", "MZ", "XQ", "KA", "MW", "WZ", "UD"]
    exclusive_cities = ["RO", "VE", "SI", "NA"]
    
    
    portfolio = []
    for _ in range(num_samples):
        free_sample = random.sample(free_cities, k=(n-1))
        exclusive_sample = random.sample(exclusive_cities, k=1)
        portfolio.append(random.sample(free_sample + exclusive_sample, k=n))
    

    Output:

    >>> portfolio
    [['AW', 'EX', 'WZ', 'RO', 'MZ', 'JB'],
     ['RO', 'MZ', 'KA', 'WZ', 'XQ', 'EX'],
     ['AW', 'MW', 'VE', 'JB', 'WZ', 'XQ'],
     ['NA', 'WZ', 'JB', 'MW', 'EX', 'MZ'],
     ['SI', 'UD', 'AW', 'JB', 'WZ', 'MW'],
     ['MZ', 'JB', 'UD', 'VE', 'WZ', 'HZ'],
     ['NA', 'KA', 'UD', 'AW', 'MW', 'WZ'],
     ['JB', 'NA', 'UD', 'KA', 'WZ', 'MW'],
     ['MZ', 'RO', 'JB', 'HZ', 'AW', 'XQ'],
     ['WZ', 'HZ', 'UD', 'JB', 'VE', 'XQ']]
    

    Now something to be aware of is that this will guarantee that every sample has exactly one city from the exclusive_cities group:

    for sample in portfolio:
        print(set(sample) & set(exclusive_cities))
    

    Output:

    # Every sample has one item from the exclusive_cities group
    {'RO'}
    {'RO'}
    {'VE'}
    {'NA'}
    {'SI'}
    {'VE'}
    {'NA'}
    {'NA'}
    {'RO'}
    {'VE'}
    

    Solution 2: every sample has at most one city from the "exclusive cities" group

    If that's not your desired behavior, you can use an additional "coin flip" to determine if 1 or 0 exclusive cities show up in a given sample with this minor modification to the above logic:

    n = 6
    num_samples = 10
    
    
    portfolio = []
    for _ in range(num_samples):
        include = random.choice((0, 1))
        free_sample = random.sample(free_cities, k=(n - include))
        exclusive_sample = random.sample(exclusive_cities, k=include)
        portfolio.append(random.sample(free_sample + exclusive_sample, k=n))
    

    Now your samples may or may not include an exclusive city, but still only ever one max:

    # An empty set means an exclusive city isn't present in that sample
    set()
    set()
    {'SI'}
    {'NA'}
    set()
    {'RO'}
    {'SI'}
    {'VE'}
    set()
    {'RO'}