Search code examples
pythonrenamedocxfile-renamedoc

Renaming .doc or .docx with python according to a text from the document


I have an issue regarding changing a .doc or .docx filename according to a certain text inside the document.

I have been able to establish this function with .txt files. With the following code:

import os
import re
pat = "ID number(\\d\\d\\d\\d\\d)"         #This is for the text to be found in the file
ext = '.txt'                                #Type of file the python is searching for
mydir = ''  #Path or directory where python is doing its magic

for arch in os.listdir(mydir):              
    archpath = os.path.join(mydir, arch)
    with open(archpath) as f:
        txt = f.read()
    s = re.search(pat, txt)
    if s is None:
        continue
    name = s.group(1)
    newpath = os.path.join(mydir, name)
    if not os.path.exists(newpath):
        os.rename(archpath, newpath + ext)

Anyone have any takes on this?


Solution

  • The answer was found. The issue was on my end. I was trying to find a value. But what i needed was to specify an cell. Since the value was in a table.

    Here is the result:

    import os
    import re
    import sys
    pat = "(\d+)"       #Type of string/value that is being renamed
    ext = '.docx'       #Type of file the python is searching for
    mydir = ''  #Path or directory where python is doing its magic
    
    from docx import Document
    for arch in os.listdir(mydir):
        archpath = os.path.join(mydir, arch)
        document = Document(archpath)
        table = document.tables[0]
        s = re.search(pat,table.cell(1,2).text)
        if s is None:
            continue
        name = s.group(1)
        newpath = os.path.join(mydir, name)
        if not os.path.exists(newpath):
            os.rename(archpath, newpath + ext)
    print (newpath + ext)
    input("Press Enter to exit")
    

    It needs to be taken in account that this method is only usable with .docx files that are usable with word 2007 and later. Since python-docx does not work with earlier versions or .doc files

    So my next project is to get implemented an converter from .doc to .docx

    Thank you for everyones participation.