Search code examples
pythonobject-detectiontensorflow-datasets

checking if for any image file (.JPG) in folder "A" there is an annotation file (.XML) in folder "B"


I have a very big datasets of images and their annotations saved in two separate folders, however not all images have an annotation file. How can I write a python code to check my image files (.JPG) in folder "A" and delete the image if there is not an annotation file (.xml) with the same name for that specific image, and do nothing if annotation file exists?

I have written the following code following @Gabip comment below: enter image description here

How can I improve this code?


Solution

  • try this:

    from os import listdir,remove
    from os.path import isfile, join
    
    images_path = "full/path/to/folder_a"
    annotations_path = "full/path/to/folder_b"
    
    
    # this function will help to retrieve all files with provided extension in a given folder
    def get_files_names_with_extension(full_path, ext):
        return [f for f in listdir(full_path) if isfile(join(full_path, f)) and f.lower().endswith(".{}".format(ext))]
    
    
    images = get_files_names_with_extension(images_path, "jpg")
    annotations = set([f.split(".")[0] for f in get_files_names_with_extension(annotations_path, "xml")])
    
    for img in images:
        if img.split(".")[0] not in annotations:
            remove(join(images_path, img))