Search code examples
pythontemporary-filescode-cleanuppdf2image

Python tempfile.TemporaryDirectory() cleanup crashes with PermissionError and NotADirectoryError


Premise

I'm trying to convert some PDF to images via pdf2image and poppler, to then run some computervision tasks on.

The conversion itself works fine.

However, the conversion creates some artifacts for each page in the pdf as it is being converted, which I would like to be deleted at the end of the function. To facilitate this, I am using tempfile.TemporaryDirectory(). The function looks as follow:

    with tempfile.TemporaryDirectory() as path:
        images_from_path: [Image] = convert_from_path(
                os.path.join(path_superfolder, "calibration_target.pdf"),
                size=(2480, 3508),
                output_folder=path, poppler_path=r'E:\poppler-22.04.0\Library\bin')
        if len(images_from_path) >= page:
            images_from_path[page - 1].save(os.path.join(path_superfolder, "result.jpg"))

Problem

The trouble is, that the program always crashes with the following errors, after transforming the PDF and writing the required image to a file.

Traceback (most recent call last):
  File "C:\Python310\lib\shutil.py", line 617, in _rmtree_unsafe
    os.unlink(fullname)
PermissionError: [WinError 32] The process cannot access the file, because it is being used by another process: 'C:\\Users\\tobia\\AppData\\Local\\Temp\\tmp24c4bmzv\\bd76d834-672e-49fc-ac30-7751b7b660d0-01.ppm'
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python310\lib\tempfile.py", line 843, in onerror
    _os.unlink(path)
PermissionError: [WinError 32] The process cannot access the file, because it is being used by another process: 'C:\\Users\\tobia\\AppData\\Local\\Temp\\tmp24c4bmzv\\bd76d834-672e-49fc-ac30-7751b7b660d0-01.ppm'

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\Python310\lib\code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "E:\PyCharm 2022.2.3\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "E:\PyCharm 2022.2.3\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:\Dokumente\Uni\Informatik\BA_Thesis\tumexam-scheduling-codebase\generate_data.py", line 393, in <module>
    extract_calibration_page_as_image_from_pdf()
  File "D:\Dokumente\Uni\Informatik\BA_Thesis\tumexam-scheduling-codebase\generate_data.py", line 190, in extract_calibration_page_as_image_from_pdf
    tmp_dir.cleanup()
  File "C:\Python310\lib\tempfile.py", line 873, in cleanup
    self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
  File "C:\Python310\lib\tempfile.py", line 855, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "C:\Python310\lib\shutil.py", line 749, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "C:\Python310\lib\shutil.py", line 619, in _rmtree_unsafe
    onerror(os.unlink, fullname, sys.exc_info())
  File "C:\Python310\lib\tempfile.py", line 846, in onerror
    cls._rmtree(path, ignore_errors=ignore_errors)
  File "C:\Python310\lib\tempfile.py", line 855, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "C:\Python310\lib\shutil.py", line 749, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "C:\Python310\lib\shutil.py", line 600, in _rmtree_unsafe
    onerror(os.scandir, path, sys.exc_info())
  File "C:\Python310\lib\shutil.py", line 597, in _rmtree_unsafe
    with os.scandir(path) as scandir_it:
NotADirectoryError: [WinError 267] Directory name invalid: 'C:\\Users\\tobia\\AppData\\Local\\Temp\\tmp24c4bmzv\\bd76d834-672e-49fc-ac30-7751b7b660d0-01.ppm'

When stepping through the cleanup routine, everything seems fine, the path is correct and it starts deleting files, until at some point the internal path variable gets jumbled up and the routine crashes, because obviously a file is not a directory. To me it seems like a race condition is causing problems here.

What I have already tried

  • Rewriting the function to not use with and instead explicitly call the routine with tmp_dir.cleanup()
  • Just creating the directory without populating it with the conversion artifacts. The cleanup works in this case.
  • The documentation for tempfile mentions Permission errors occuring when files are open. The files are however only used in this function and if this is what is causing the error, I am unsure where the files are still opened or which function is causing this. My suspicion of course would be the conversion function.

Solution

  • While experimenting some more and writing this question, I found a working solution:

        with tempfile.TemporaryDirectory() as path:
            images_from_path: [Image] = convert_from_path(
                    os.path.join(path_superfolder, f"calibration_target_{exam_type}.pdf"),
                    size=(2480, 3508),
                    output_folder=path, poppler_path=r'E:\poppler-22.04.0\Library\bin')
            if len(images_from_path) >= page:
                images_from_path[page - 1].save(os.path.join(path_superfolder, "result.jpg"))
            images_from_path = []
    

    It seems that somehow, the routine had trouble cleaning up, because the converted images, are actually the artifacts created by pdf2image and were still being held by my data structure. Resetting the data structure, before implicitly initiating the cleanup fixed the issue.

    If there is a better way of tackling this issue, please do not hesitate to inform me.