I have an issue regarding changing a .doc or .docx filename according to a certain text inside the document.
I have been able to establish this function with .txt files. With the following code:
import os
import re
pat = "ID number(\\d\\d\\d\\d\\d)" #This is for the text to be found in the file
ext = '.txt' #Type of file the python is searching for
mydir = '' #Path or directory where python is doing its magic
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
with open(archpath) as f:
txt = f.read()
s = re.search(pat, txt)
if s is None:
continue
name = s.group(1)
newpath = os.path.join(mydir, name)
if not os.path.exists(newpath):
os.rename(archpath, newpath + ext)
Anyone have any takes on this?
The answer was found. The issue was on my end. I was trying to find a value. But what i needed was to specify an cell. Since the value was in a table.
Here is the result:
import os
import re
import sys
pat = "(\d+)" #Type of string/value that is being renamed
ext = '.docx' #Type of file the python is searching for
mydir = '' #Path or directory where python is doing its magic
from docx import Document
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
document = Document(archpath)
table = document.tables[0]
s = re.search(pat,table.cell(1,2).text)
if s is None:
continue
name = s.group(1)
newpath = os.path.join(mydir, name)
if not os.path.exists(newpath):
os.rename(archpath, newpath + ext)
print (newpath + ext)
input("Press Enter to exit")
It needs to be taken in account that this method is only usable with .docx files that are usable with word 2007 and later. Since python-docx does not work with earlier versions or .doc files
So my next project is to get implemented an converter from .doc to .docx
Thank you for everyones participation.