Search code examples
pythontextdocxpython-docx

How can I batch this code on a folder of word files?


I have a developed a piece of code that I would like to run on a folder of docx files. I have successfully figured out how I can run this on one file (see code below) but am too green to figure out how to batch this.

I would like to write all docx files into the same output.txt file as opposed to writing to a separate txt file for each docx file.

Thank you for your help!

import docx

from docx import Document

def readtxtandtab(filename):
    doc = docx.Document(filename)
    fullText = []
    for table in doc.tables:
        for row in table.rows:
            for cell in row.cells:
                fullText.append(cell.text)

    for para in doc.paragraphs:
        fullText.append(para.text)

    return '\n'.join(fullText)

with open("Output.txt", "a") as text_file:
    text_file.write(readtxtandtab('filename.docx'))

Solution

  • You can use glob to get all the word files in a directory. You can then use a for loop to execute your code on every file.

    import glob
    .
    .
    .
    with open("output.txt", "a") as text_file:
        for wordfile in glob.glob('*.docx'):
            text_file.write(readtxtandtab(wordfile))
    

    For glob the string *.docx means select anything that ends with .docx.