from docx import Document
alphaDic = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','!','?','.','~',',','(',')','$','-',':',';',"'",'/']
while docIndex < len(doc.paragraphs):
firstSen = doc.paragraphs[docIndex].text
rep_dic = {ord(k):None for k in alphaDic + [x.upper() for x in alphaDic]}
translation = (firstSen.translate(rep_dic))
removeSpaces = " ".join(translation.split())
removeLineBreaks = removeSpaces.replace('\n','')
doc.paragraphs[docIndex].text = removeLineBreaks
docIndex +=1
I am attempting to remove line breaks from the document, but it doesn't work. I am still getting
Hello
There
Rather than
Hello
There
I think what you want to do is get rid of an empty paragraph. The following function could help, it deletes a certain paragraph that you don't want:
def delete_paragraph(paragraph): p = paragraph._element p.getparent().remove(p) p._p = p._element = None
Code by: Scanny*
In your code, you could check if translation is equal to ''
or not, and if it is then call the delete_paragraph
function, so your code would be like:
while docIndex < len(doc.paragraphs):
firstSen = doc.paragraphs[docIndex].text
rep_dic = {ord(k):None for k in alphaDic + [x.upper() for x in alphaDic]}
translation = (firstSen.translate(rep_dic))
if translation != '':
doc.paragraphs[docIndex].text = translation
else:
delete_paragraph(doc.paragraphs[docIndex])
docIndex -=1 # go one step back in the loop because of the deleted index
docIndex +=1
*Reference- feature: Paragraph.delete()