I have the following requirement which I am trying to meet in Windows 10 using Python script:
Change all the filenames to lowercase in multiple folders recursively. For this, I used the following code:
import os path = "C://Users//shilpa//Desktop//content"
for dir,subdir,listfilename in os.walk(path):
for filename in listfilename:
new_filename = filename.lower()
src = os.path.join(dir, filename)
dst = os.path.join(dir, new_filename)
os.rename(src,dst)
Update the references of these files embedded in certain tag. The tag here is
<img href=(filename.png)>
.
Here, the <img href=>
is constant and the filenames filename.png are different.
So, here is the example:
Existing filenames:
ABC.dita
XYZ.dita
IMG.PNG
These are referenced in different files, say
IMG.PNG
is referenced in XYZ.dita
.
After step1, these change as follows:
abc.dita
xyz.dita
img.png
This will break all the references included in different files.
I want to update all the changed filename references so that the links stay intact.
I don't have any experience with Python and a beginner only. To achieve step2, I should be able to use regex and find a pattern, say,
<img href="(this will be a link to the IMG.PNG>"
.
This will be a part of .dita
file.
After step1, the reference in the file will break.
How can I make changes to the filename and also retain their references? The ask here is, find and replace the old names by new names in all the files.
Any help is appreciated.
I'll assume you hav e a folder containing all the relevant files. So the problem is divided into two parts:
This can be done using glob
or os.walk
.
import os
upper_directory = "[insert your directory]"
for dirpath, directories, files in os.walk(upper_directory):
for fname in files:
path = os.path.join(dirpath, fname)
lower_file_references(path)
When you have the path of the file, you need to read the data from it:
with open(path) as f:
s = f.read()
Once you have the data of the referencing files, you can use this code to lower the reference string:
head = "<img href="
tail = ">"
img_start = s.find(head, start)
while img_start != -1:
img_end = s.find(">", img_start)
s = s[:img_start] +s[img_start:img_end].lower() + s[img_end:]
img_start = img_end
Alternately you can use some XML parsing module. For example BeautifulSoup, this would help avoiding problems like href=
vs href =
from bs4 import BeautifulSoup as bs
s = bs(s)
imgs = s.find_all("img")
for i in imgs:
if "href" in i.attrs:
i.attrs["href"] = i.attrs["href"].lower()
s = str(s)
In both ways you can then rewrite the files. you can do it like that:
with open(path, "w") as f:
f.write(s)
import os
from bs4 import BeautifulSoup as bs
def lower_file_references(file_path):
with open(path) as f:
s = f.read()
s = bs(s)
imgs = s.find_all("img")
for i in imgs:
if "href" in i.attrs:
i.attrs["href"] = i.attrs["href"].lower()
s = str(s)
with open(path, "w") as f:
f.write(s)
upper_directory = "[insert your directory]"
for dirpath, directories, files in os.walk(upper_directory):
for fname in files:
path = os.path.join(dirpath, fname)
lower_file_references(path)
I must say that this is a simple method, which will work great assuming your files aren't huge. If you have big files that can't be read to memory all at once, or a lot of files you might want to think of ways avoiding reading the data of all the files.