Search code examples
pythonpymupdf

How do I delete line break in PDF text extraction in Python?


I used PyMuPDF to get the text in the PDF, here is my code

import fitz

pdf_document = "KRIP.pdf"
doc = fitz.open(pdf_document)

page1 = doc.loadPage(0)
page1text = page1.get_text()
print("Text from PDF: ", page1text)

the output should be

KRIPTOGRAFI

but it turns out

KRIPTOGRAFI

there is a line break after the word "KRIPTOGRAFI". Is there any way to remove it?


Solution

  • You need to remove the blanks at the end. The function strip() does that for you.

    Your new code would be:

    import fitz
    
    pdf_document = "KRIP.pdf"
    doc = fitz.open(pdf_document)
    
    page1 = doc.loadPage(0)
    page1text = page1.get_text().strip()
    print("Text from PDF: ", page1text)