If I have documents labeled:
2023_FamilyDrama.pdf
2024_FamilyDrama.pdf
2022-beachpics.pdf
2020 Hello_world bring fame.pdf
2019-this-is-my_doc.pdf
I would like them to be
FamilyDrama_2023.pdf
FamilyDrama_2024.pdf
beachpics_2022.pdf
Hello_world bring fame_2020.pdf
this-is-my_doc_2019.pdf
So far, I know how to remove a string from the beginning of the filename, but do not know how to save it and append it to the end of the string.
import os
for root, dir, fname in os.walk(PATH):
for f in fname:
os.chdir(root)
if f.startswith('2024_'):
os.rename(f, f.replace('2024_', ''))
This solution uses parse instead of regular expressions.
from parse import parse
fnames = [
"2023_FamilyDrama.pdf",
"2024_FamilyDrama.pdf",
"2022-beachpics.pdf"
]
for file_name in fnames:
year,sep,name,ext = parse("{:d}{}{:l}.{}", file_name)
print(f"{name}_{year}.{ext}")
# Output:
# FamilyDrama_2023.pdf
# FamilyDrama_2024.pdf
# beachpics_2022.pdf
The statement parse("{:d}{}{:l}.{}", file_name)
searches for:
{:d}
A number (in your case, the year){}
Any number of characters up until the beginning of the next matching rule. In your case, either a -
or _
.{:l}
Any number of letters until the beginning of the next matching rule..
A period{}
Everything remaining until the end of the string.The output type from parse()
is parse.Result
, but I unpacked it like a tuple
in this example.
You can rename or move the file however you prefer. I like pathlib
from pathlib import Path
from parse import parse
# The directory containing the files.
in_dir = Path(r"C:/mystuff/my_pdfs") # Absolute path. Also look into `Path().resolve()`
# Create a list of all PDFs in the directory
# The initial code snippet had the file names hard-coded as a list of strings.
# This creates a list of `Path` objects, so you get just the name of `f` by calling `f.name`
files = [f for f in in_dir.iterdir() if f.name.find(".pdf") > -1]
# Rename each file, one at a time.
for f in files:
# Extract the subtrings
year,sep,name,ext = parse("{:d}{}{:l}.{}", f.name)
# Rearrange them
new_name = f"{name}_{year}.{ext}"
# Print friendly message for this example
print(f"renaming {f.name} to {new_name}")
# Rename the file
f.rename(new_name)
# Output
# renaming 2022-beachpics.pdf to beachpics_2022.pdf
# renaming 2023_FamilyDrama.pdf to FamilyDrama_2023.pdf
# renaming 2024_FamilyDrama.pdf to FamilyDrama_2024.pdf
It doesn't matter for your example input, but I should note that {:d}
will drop leading zeroes. You can use most format specifications as well. Also see the format examples.
>>> from parse import parse
>>> foo = "0123_abc.pdf"
>>> parse("{0:04}{}{:l}.{}", foo)
<Result ('0123', '_', 'abc', 'pdf') {}>
Alternatively, you can wipe out the hyphens (since you plan to do that anyway). That allows you to hard-code the _
separator and then treat everything as straight format-less {}
.
>>> bar = "0123-abc.pdf"
>>> parse("{}_{}.{}", bar.replace("-","_"))
<Result ('0123', 'abc', 'pdf') {}>
Edit: More complicated file names.
The file names in the question all followed the template [year][separator][letters].[extension]
, so that's why I chose the {:l}
pattern. If instead, you have a filename like 2022-workplace-picture-site-report.pdf
, then you need a rule for the template [year][one character of anything][any number of anything].[extension]
>>> from parse import parse
>>> fname = "2022-workplace-picture-site-report.pdf"
>>> parse("{:d}{}{}.{}", fname)
<Result (2022, '-', 'workplace-picture-site-report', 'pdf') {}>