Search code examples
pythonpython-3.xzip

Wrong order in iterating over folder and printing filenames


Please help with trying to print the filenames of the pictures. I either print the same filename or the same picture with different filename.

I want the output to be FileName then Pic associated with FileName. Instead I am getting FileName0 and Pic0 then FileName0 then Pic1 or Filename0 then Pic0 then Filename1 then Pic0.

I added more code to the original post for clarification of what I was trying to do. Hopefully I/it makes sense. I want to print the name of the image and then display the image. The new code I came up with displays the image then prints the name at them bottom of it with none and then the program terminates. Say the list has 4 images. I want to print the name at image[0] and then display image[0] in a loop and then print image[1] display image[1]

#OLD CODE
with zipfile.ZipFile("file.zip", 'r') as zip_ref:
    zip_ref.extractall("folderName")

    for info in zip_ref.infolist():
        for file in os.listdir("folderName"):
            image=Image.open(file).convert('RGB') 

            print(info.filename)
            display(image)

#NEW CODE
#My current list length is 4
file_name = []
actual_image = []
##Extract all the files and put in folder
with zipfile.ZipFile("readonly/small_img.zip", 'r') as zip_ref:
    zip_ref.extractall("pyproject")

#Add name to list/Add image to list.  Probably should be one list.
for entry in os.scandir("pyproject"):
    file_name.append(entry.name)
for file in os.listdir("pyproject"):
    image=Image.open(file).convert('RGB')
    actual_image.append(image)
#print(info.filename,display(image)) 

#Newer line of code directly above.  
#When the above for loop becomes nested it displays 4 
#pictures with the file number underneath.  Expected result is 1pic to 1 filename.  
#Its closer to what I want.  Will keep trying.

print(len(file_name))

#Returns file names.
def name_of_file(a):
    for names in a:
        return names

#Returns image to be displayed
def image_of_file(b):
    for image in b:
        return (display(image))
##Prints out image name and then displays image
print(name_of_file(file_name),image_of_file(actual_image))

###Dictionary example code:
list_of_pictures = [{image1_of_four :PIL.image,bounding_box,pytesseract_text}]

Solution

  • I think the confusion comes from the double iteration. As far as I can see, this is not necessary, because you just want to iterate over every (image) file in a zipped directory. (If I have understood the question correctly.)

    A single iteration is sufficient here:

    import zipfile
    
    with zipfile.ZipFile("file.zip", 'r') as zip_ref:
        for file in zip_ref.filelist:
            print(file.filename)
            # ...
    
    

    So for processing the files inside the zip archive you could do something like this (of course there are several possibilities, depending on the usecase):

    import zipfile
    from PIL import Image, UnidentifiedImageError
    
    with zipfile.ZipFile("file.zip", 'r') as zip_ref:
        for zipped_file in zip_ref.filelist:
            print(f"This is your fileinfo: {zipped_file.filename}")
            try:
                file = zip_ref.open(zipped_file)
                image = Image.open(file).convert('RGB')
            except UnidentifiedImageError:
                print(f"Error processing {zipped_file.filename}")
    

    If you really need more information from the iterated (zipped) files, then the infolist() method is ok, giving you the information from ZipInfo-object.

    Update after question editing:

    As far as I can see, there is still a picture to be displayed and the corresponding file name to be printed. If my assumption is correct, then there are several issues with the presented code:

    • There is no need at all to iterate several times. No matter if you have one nested iteration or several iterations in a row. Limiting the number of iterations also reduces the complexity and probably the whole thing becomes less complicated. To go into detail: You use several iterations to 1. unzip the files (zip_ref.extractall() is already iterating itself), 2. store the filenames in a list, 3. store the image objects in a list, 4. print the stored filenames, 5. display the image objects. All information is already available to you when iterating over the files in the archive, or can be easily computed in the current iteration step. This completely eliminates the need to create multiple data structures for file names, image objects etc. Here you already have the file, thus also the file name and thus also the corresponding image.
    • I still see no reason to unpack the whole archive first. All this can be done in the iteration itself. If the images themselves are to be saved, then unpacking is of course useful. But then you can also simply unzip the files and then iterate over the unzipped files with Python, e.g. with os.scandir(). This was implemented in the updated code. But this is not necessary if you only want to display the current file of each iteration step.

    Unfortunately the function of display() is still not known to me. Probably something similar to Image.show() is done there. After the code update within the question, I can only mention small changes to my example to show how easy it can be to display the file name for the corresponding image:

    import os
    import zipfile
    from PIL import Image, UnidentifiedImageError
    
    
    with zipfile.ZipFile("file.zip", 'r') as zip_ref:
        for zipped_file in zip_ref.filelist:
            try:
                image = Image.open(zip_ref.open(zipped_file)).convert('RGB')
                print(os.path.basename(zipped_file.filename))
                image.show()  # simulating: display(image)
                input("Press a key to show next image...")
            except UnidentifiedImageError:
                pass
    

    I only print the file name, for which there is also a matching picture. No other prints (to keep things as clear as possible). image.show() is used to simulate the unknown display(image)-function. To make it clear that the corresponding file name refers to the currently opened image, I have included a pause, here in the form of a user prompt (input()).

    All this under the assumption that simply the appropriate file name for a certain image should be displayed. Using only one iteration should be the appropriate solution here.

    Using multiple iterations to store objects in multiple lists (as done in the question) leads to a disadvantage: Higher complexity. In this case, the index positions of the lists have to match each other, and when iterating over one list, you have to access the other list with the same index position like this:

    list_a = [1, 2, 3]
    list_b = ["a", "b", "c"]
    for index, el in enumerate(list_a):
        print(el, list_b[index])
    

    You have to do this without changing much of your code. But then you have to make sure that the lists never change (or rather use tuples) and this is simply more complex (and also more complicated). See also this.