I need to convert a word document into html code and then save it into a .txt file with lines of no longer than 100 characters (there's a process later on that won't pick up more than 255 characters if they're not in separate lines).
So far, I've successfully (though a better solution is welcome) managed to convert the .docx file into html and deploy that variable into a .txt file. However, I'm not able to figure out how to separate the lines. Is there any integrated function which could achieve this?
import mammoth
with open(r'C:\Users\uXXXXXX\Downloads\Test_Script.docx', "rb") as docx_file:
result = mammoth.convert_to_html(docx_file)
html = result.value # The generated HTML
messages = result.messages # Any messages, such as warnings during conversion
with open(r'C:\Users\uXXXXXX\Downloads\Output.txt', 'w') as text_file:
text_file.write(html)
In that case, you can just do
html = "..."
i = 100
while i < len(html):
html = html[:i] + "\n" + html[i:]
i += 101