I'm trying to edit a free Ebook I found online into easily readable text for Kindle, with headers and full paragraphs.
I'm very new to Python and coding in general so I don't really have any progress.
Each line is separated by a break with Enter, so each line is considered a separate Paragraph by python.
Basically what needs to be done is delete the space and breaks between the lines so the text doesn't break when converted into MOBI or EPUB.
The text looks like this:
And should look like this:
Any help is welcome!
I used the docx library that is not installed by default, you can use pip or conda:
pip install python-docx
conda install python-docx --channel conda-forge
After install:
from docx import Document
doc = Document(r'path\to\file\pride_and_prejudice.docx')
all_text=[]
all_text_str=''
for para in doc.paragraphs:
all_text.append(para.text)
all_text_str=all_text_str.join(all_text)
clean_text=all_text_str.replace('\n', '') # Remove linebreaks
clean_text=clean_text.replace(' ', '') # Remove even number of spaces (e.g. This usually eliminates non-spaces nicely, but you can tweak accordingly.
document = Document()
p = document.add_paragraph(clean_text)
document.save(r'path\to\file\pride_and_prejudice_clean.docx')