Search code examples
pythonpython-3.xpython-docx

Python 3 - How to remove line/paragraph breaks


from docx import Document

alphaDic = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','!','?','.','~',',','(',')','$','-',':',';',"'",'/']

while docIndex < len(doc.paragraphs):
    firstSen = doc.paragraphs[docIndex].text
    rep_dic = {ord(k):None for k in alphaDic + [x.upper() for x in alphaDic]}
    translation = (firstSen.translate(rep_dic))
    removeSpaces = " ".join(translation.split())
    removeLineBreaks = removeSpaces.replace('\n','')
    doc.paragraphs[docIndex].text = removeLineBreaks

    docIndex +=1

I am attempting to remove line breaks from the document, but it doesn't work. I am still getting

Hello


There

Rather than

Hello
There

Solution

  • I think what you want to do is get rid of an empty paragraph. The following function could help, it deletes a certain paragraph that you don't want:

    def delete_paragraph(paragraph):
        p = paragraph._element
        p.getparent().remove(p)
        p._p = p._element = None
    

    Code by: Scanny*

    In your code, you could check if translation is equal to '' or not, and if it is then call the delete_paragraph function, so your code would be like:

    while docIndex < len(doc.paragraphs):
        firstSen = doc.paragraphs[docIndex].text
        rep_dic = {ord(k):None for k in alphaDic + [x.upper() for x in alphaDic]}
        translation = (firstSen.translate(rep_dic))
        if translation != '':
            doc.paragraphs[docIndex].text = translation 
        else:
            delete_paragraph(doc.paragraphs[docIndex])
            docIndex -=1 # go one step back in the loop because of the deleted index
    
        docIndex +=1
    

    *Reference- feature: Paragraph.delete()