Search code examples
pythonwildcardsubdirectory

How to search a subfolder, which doesn't exist yet, with wildcard in Python


I'm trying to download some web pages through pywebcopy. I use this library as it clones exactly same, however, it tries to download every file from the web page. As a result, sometimes it gets stuck at some file and goes to infinite loop, I guess. (I never waited more than 10 minutes.) In fact, it downloads what I want to download, which is the complete web page. So, I want to terminate its process when the file is once downloaded and go for the other web pages in a loop.

I would do it with while but the folder structure is too nested. And as the folder doesn't exist before library downloads them, I couldn't make a search with os.path.

The folder structure is like this:

main_folder├───subfolder1───some_folder1  
│                 └───some_folder2
│                        some_image.png
│   
│
└───subfolder2
    └───sub_subfolder1
        └───sub_subfolder2
            └───sub_subfolder3
                └───sub_subfolder4
                    └───sub_subfolder5
                        │   index.html
                        │   some.pwc
                        │
                        └───amp
                                the_file_I_want.pwc

The file I need is always in amp folder. So, basically I should find that folder and check if the file is there. However the names of sub_subfolder3, sub_subfolder4 and sub_subfolder5 changes for each web page. I have to search with wildcard which is something like: "main_folder/subfolder2/**/amp/*.pwc". But the folder doesn't exist before downloading start.

what I want to do is something like this:

from pywebcopy import save_webpage
import glob
...

pattern = 'main_folder/subfolder2/**/amp/*.pwc'
while glob.glob(pattern).is_file() = False:
    save_webpage(url, download_folder, **kwargs)

It's an invalid syntax but this is what exactly I want. I've searched but couldn't come up with any solution. Any help would be highly appreciated.


Solution

  • Try this:

    while any(os.path.isfile(i) for i in glob.iglob(pattern)):