Search code examples
pythonhtmlmammoth

Python Writing huge string in text file with a new line character every 240 characters


I need to convert a word document into html code and then save it into a .txt file with lines of no longer than 100 characters (there's a process later on that won't pick up more than 255 characters if they're not in separate lines).

So far, I've successfully (though a better solution is welcome) managed to convert the .docx file into html and deploy that variable into a .txt file. However, I'm not able to figure out how to separate the lines. Is there any integrated function which could achieve this?

import mammoth

with open(r'C:\Users\uXXXXXX\Downloads\Test_Script.docx', "rb") as docx_file:
    result = mammoth.convert_to_html(docx_file)
    html = result.value # The generated HTML
    messages = result.messages # Any messages, such as warnings during conversion
    
with open(r'C:\Users\uXXXXXX\Downloads\Output.txt', 'w') as text_file:
    text_file.write(html)

Solution

  • In that case, you can just do

    html = "..."
    i = 100
    while i < len(html):
        html = html[:i] + "\n" + html[i:]
        i += 101