Search code examples

DOCX file to text file conversion using Python

I wrote the following code to convert my docx file to text file. The output that I get printed in my text file is the last paragraph/part of the whole file and not the complete content. The code is as follows:

from docx import Document
import io
import shutil

def convertDocxToText(path):
    for d in os.listdir(path):
        if fileExtension =="docx":
            docxFilename = path + d
            document = Document(docxFilename)

# for printing the complete document
            print('\nThe whole content of the document:->>>\n')
            for para in document.paragraphs:
                textFilename = path + d.split(".")[0] + ".txt"
                with,"w", encoding="utf-8") as textFile:
                    print(x) //the complete content gets printed by this line
                    textFile.write((x)) #after writing the content to text file only last paragraph is copied.

path= "/home/python/resumes/"


  • Problem

    as your code says in the last for loop:

            for para in document.paragraphs:
                textFilename = path + d.split(".")[0] + ".txt"
                with,"w", encoding="utf-8") as textFile:

    for each paragraph in whole document, you try to open a file named textFilename so let's say you have a file named MyFile.docx in /home/python/resumes/ so the textFilename value that contains the path will be /home/python/resumes/MyFile.txt always in whole of for loop, so the problem is that you open the same file in w mode which is a Write mode, and will overwrite the whole file content.


    you must open the file once out of that for loop then try add paragraphs one by one to it.