Search code examples
pythonlistrandomshufflelistdir

os.listdir() - choose randomly from the returned list based on a condition


I have a directory containing thousands of images from three different domains

let's say the file names are xxx_A.png and yyy_B.png and zzz_C.png there are thousands from each domain

os.listdir() will return a list for all image names inside the directory

I then want to filter this list based on some percentages

Example: I specify that I want to have out of these thousands of images, only 100 shuffled images where 30% of them will be from domainA, 30% from domainB and 40% of domainC

So simply given a certain number, I have these percentages, and I choose x number of random images (based on the image name for sure, because they are already specified), and this will be the new list

Example:

Input:

['1_A.png', '2_A.png', '3_A.png', '4_A.png', '5_A.png', '6_A.png', '7_A.png', '8_A.png', '9_A.png', '10_A.png', '1_B.png', '2_B.png', '3_B.png', '4_B.png', '5_B.png', '6_B.png', '7_B.png', '8_B.png', '9_B.png', '10_B.png', '1_C.png', '2_C.png', '3_C.png', '4_C.png', '5_C.png', '6_C.png', '7_C.png', '8_C.png', '9_C.png', '10_C.png']

I want 12 images, 30% from domain A, 30% from domain B and 40% from domain C

Output:

 ['1_C.png', '10_C.png', '2_B.png', '4_A.png', '3_A.png', '9_C.png', '7_C.png', '6_A.png', '8_B.png', '10_B.png', '3_C.png', '5_C.png']

How can I make this ?


Solution

  • Below is a function I defined. As Martin stated, math.ceil is probably the best function to use to get the number of files (so you don't get less than your desired amount). Also, you will want to sample without replacement (meaning you don't want to repeat file names), so you should not use random.choice like Rakesh did (as random.choice samples with replacement). The random.shuffle avoids this problem.

    Input:

    import random
    import math
    os_dir_list= ['1_A.png', '2_A.png', '3_A.png', '4_A.png', '5_A.png', '6_A.png', '7_A.png', '8_A.png', '9_A.png', '10_A.png', '1_B.png', '2_B.png', '3_B.png', '4_B.png', '5_B.png', '6_B.png', '7_B.png', '8_B.png', '9_B.png', '10_B.png', '1_C.png', '2_C.png', '3_C.png', '4_C.png', '5_C.png', '6_C.png', '7_C.png', '8_C.png', '9_C.png', '10_C.png']       
    def shuffle_pick(os_dir_list,length, tuple_list):
        shuffled_list = []
        for letter,percent in tuple_list:
            sub_list = [img for img in os_dir_list if img.endswith(letter + '.png')]
            random.shuffle(sub_list)
            num = int(math.ceil(len(sub_list)*percent/100))
            shuffled_list += sub_list[:num]
        return shuffled_list[:length]
    
    print(shuffle_pick(os_dir_list, 12, [('A',30),('B',30),('C',60)]))
    

    Output:

    ['1_A.png', '5_A.png', '3_A.png', '6_A.png', '1_B.png', '7_B.png', '9_B.png', '5_B.png', '10_C.png', '4_C.png', '3_C.png', '9_C.png']
    

    You can also call random.shuffle(shuffled_list) before the return statement to shuffle the output list.