Search code examples
pythonpdftextfpdf2

FPDF2 breaking url links that span multiple lines


When I convert text documents to pdfs in Python using the FPDF2 library it breaks apart urls that span multiple lines by adding a space or a newline (I'm not sure which one).

I use the following code to convert a text file to a pdf file using FPDF2 library:

import glob
from fpdf import FPDF

#txt files
txt_files = glob.glob(path + r'\*.txt')       

for txt_file in txt_files:
    pdf=FPDF()
    doc=[]
    with open (txt_file, 'r', encoding='utf-8') as infile:
        print(txt_file)
        doc = infile.read()
        pdf.add_page()
        pdf.add_font("dejavu-sans", style="", fname="DejaVuSans.ttf")
        pdf.set_font(family="dejavu-sans", style="", size=12)
        pdf.write(5, doc)
        pdf.output(txt_file[:-4]+'.pdf')

My input text file looks like: enter image description here

My generated pdf looks like this: enter image description here

I use the refextract Python library and get this extracted reference: enter image description here

At first I thought this was an issue with the refextract library but when I select the hyperlink in the pdf file it looks like the probelm is with FPDF2 breaking apart the url (hovering over the url in adobe only shows the partial address too):

enter image description here

Does anyone know how to overcome this so that nothing is inserted midway through a url when converting text files to pdfs using FPDF2?

P.S. Sorry, I don't have enough reputation to post the images within the post (i.e. not via links - it's not that I don't know how).


Solution

  • If you want to render text without performing any wrapping, you can use FPDF.text():

    from fpdf import FPDF
    
    pdf = FPDF()
    pdf.add_page()
    pdf.set_font("Helvetica", size=14)
    pdf.text(pdf.x, pdf.y, "https://stackoverflow.com/questions/77930540/fpdf2-breaking-url-links-that-span-multiple-lines")
    pdf.output("fpdf2-breaking-url-links-that-span-multiple-lines.pdf")