I have a developed a piece of code that I would like to run on a folder of docx files. I have successfully figured out how I can run this on one file (see code below) but am too green to figure out how to batch this.
I would like to write all docx files into the same output.txt file as opposed to writing to a separate txt file for each docx file.
Thank you for your help!
import docx
from docx import Document
def readtxtandtab(filename):
doc = docx.Document(filename)
fullText = []
for table in doc.tables:
for row in table.rows:
for cell in row.cells:
fullText.append(cell.text)
for para in doc.paragraphs:
fullText.append(para.text)
return '\n'.join(fullText)
with open("Output.txt", "a") as text_file:
text_file.write(readtxtandtab('filename.docx'))
You can use glob to get all the word files in a directory. You can then use a for loop to execute your code on every file.
import glob
.
.
.
with open("output.txt", "a") as text_file:
for wordfile in glob.glob('*.docx'):
text_file.write(readtxtandtab(wordfile))
For glob the string *.docx
means select anything that ends with .docx
.