Search code examples
pythonlistintersectionsampling

Generating a random sample list from a population that has no elements in common with another list


I wish to sample elements from a list such that none of the element are contained in another list of specified elements. I wish to keep generating new samples, till one that is non-intersecting is generated. This, code below is what I have thought of, but it's not working whenever there is an intersecting initial sample,it goes into an infinite loop and the print reveals that all of the generated samples are the same.

import random 
unique_entities=['100','1001','10001','100001','11111']
pde_fin= ['2151', '2146', '2153', '2135', '2158', '2160', '2137', '2169', '2147', '2015', '2022', '2173', '2028', '2014', '2018', '2009', '1140', '1085', '1136', '1132', '1007', '1080', '1078', '1131', '1106', '1164', '1092', '1108', '1118', '1045', '1051', '1006','1001']
random_entities=random.sample(unique_entities,3) #choses 5 unique entities 
while(not(set(random_entities).isdisjoint(pde_fin))):
       random_entites=random.sample(unique_entities,5)
       print(random_entities,"random_entites")

print(unique_entities)

Can you please help me understand what is going wrong?


Solution

  • There are two issues with the line random_entites=random.sample(unique_entities,5):

    • First, there is a typo, you wrote random_entites instead of random_entities.
    • Second, you're taking a sample of 5 elements from unique_entities, which happens to contain only 5 elements in total. Therefore the sample always contains the element '1001', the one element which is also in pde_fin.

    Here is a working version of the program, which includes some other tweaks:

    import random
    
    unique_entities = ['100', '1001', '10001', '100001', '11111']
    pde_fin = ['2151', '2146', '2153', '2135', '2158', '2160', '2137', '2169', '2147', '2015', '2022', '2173', '2028',
               '2014', '2018', '2009', '1140', '1085', '1136', '1132', '1007', '1080', '1078', '1131', '1106', '1164',
               '1092', '1108', '1118', '1045', '1051', '1006', '1001']
    
    sample_size = 3
    
    random_entities = set(random.sample(unique_entities, sample_size))
    print(f"{random_entities=}")
    while not random_entities.isdisjoint(pde_fin):
        random_entities = set(random.sample(unique_entities, sample_size))
        print(f"{random_entities=}")
    
    print(f"Result: {random_entities}")