Search code examples
pythonpython-docx

When I save the docx file in python, data gets corrupted


I am able to edit & save txt without problem but when I save the docx file, data gets corrupted. Like the image here: image of error Any suggestions to save docx properly? Thanks.

    def save(self,MainWindow):
        if self.docURL == "" or self.docURL.split(".")[-1] == "txt":
            text = self.textEdit_3.toPlainText()
            file = open(self.docURL[:-4]+"_EDITED.txt","w+")
            file.write(text)
            file.close()
        elif self.docURL == "" or self.docURL.split(".")[-1] == "docx":
            text = self.textEdit_3.toPlainText()
            file = open(self.docURL[:-4] + "_EDITED.docx", "w+")
            file.write(text)
            file.close()

Solution

  • .docx files are much more than text - they are actually a collection of XML files with a very specific format. To read/write them easily, you need the python-docx module. To adapt your code:

    from docx import Document
    
    ...
    
    elif self.docURL == "" or self.docURL.split(".")[-1] == "docx":
        text = self.textEdit_3.toPlainText()
        doc = Document()
        paragraph = doc.add_paragraph(text)
        doc.save(self.docURL[:-4] + "_EDITED.docx"
    

    and you're all set. There is much more you can do regarding text formatting, inserting images and shapes, creating tables, etc., but this will get you going. I linked to the docs above.