I used PyMuPDF to get the text in the PDF, here is my code
import fitz
pdf_document = "KRIP.pdf"
doc = fitz.open(pdf_document)
page1 = doc.loadPage(0)
page1text = page1.get_text()
print("Text from PDF: ", page1text)
the output should be
KRIPTOGRAFI
but it turns out
KRIPTOGRAFI
there is a line break after the word "KRIPTOGRAFI". Is there any way to remove it?
You need to remove the blanks at the end. The function strip()
does that for you.
Your new code would be:
import fitz
pdf_document = "KRIP.pdf"
doc = fitz.open(pdf_document)
page1 = doc.loadPage(0)
page1text = page1.get_text().strip()
print("Text from PDF: ", page1text)