Search code examples
pythonencodingfilepathfile-renamepypdf

os.rename() disrupted due to unwanted "/", which won't get replaced


I am trying to rename files using information obtained through PdfFileReader, from the PyPDF2 library. Sometimes, the information (in this case the title obtained with reader.metadata.title contain backslashes ("/"), which disrupt the renaming process as they are considered directory levels in the path I indicate in os.rename() as destinations paths. I have tried to replace the backslashes with "-" by applying the os.replace() method on the strings obtained but for some reason, this doesn't work resulting in a FileNotFoundError when I try to rename. I have double checked and the type of the variable containing reader.metadata.title is str, so in theory os.replace() method should successfully apply. Is the "TOC/TOC" shown in my output example below some sort of encoding that needs to be dealt with differently? Thanks.

My code:

from PyPDF2 import PdfReader


for pdf_file in os.listdir(downloads_path):
    if pdf_file.endswith(".pdf"):
        current_file_path = os.path.join(downloads_path, pdf_file)
        reader = PdfReader(open(current_file_path, "rb"))
        new_name_pdf_file = reader.metadata.title
        new_name_pdf_file.replace("/", "-")

        # output example: 'Outside Back Cover - Graphical abstract TOC/TOC in double column/Cover image legend if applicable, Bar code, Abstracting and Indexing information'
        print(new_name_pdf_file)
        new_pdf_destination = os.path.join(destination_path, new_name_pdf_file)
        os.rename(current_file_path, new_pdf_destination)

Output error example:

FileNotFoundError: [Errno 2] No such file or directory: '/Users/me/Documents/temporary_downloads_folder/Outside-Back-Cover---Graphical-abstract-TOC-TOC-in-double-column-C_2022_Nano.pdf' -> '/Users/me/Documents/destination_folder/Outside Back Cover - Graphical abstract TOC/TOC in double column/Cover image legend if applicable, Bar code, Abstracting and Indexing information.pdf'

Solution

  • The line

    new_name_pdf_file.replace("/", "-")
    

    doesn't do what you think it does. It does not change the string new_name_pdf_file points to. In fact: It can't do that. Strings are immutable in python. They cannot be changed. Instead, it creates a new string with the replacement done.

    Change the line to

    new_name_pdf_file = new_name_pdf_file.replace("/", "-")
    

    and it should work.