Search code examples
pythonreportlabpypdf

Why is PyPDF2 and reportlab removing spaces when inserting text?


I am trying to insert formatted text into the last page of my PDF. I am using the PyPDF2 and reportlab libraries to do this. I am using Python 2.7.

For some reason the text gets inserted without spaces and on a new line for every character (not for every CRLF). Where did I go wrong or is there a better way to do this?

Thanks.

PYTHON CODE:

# Libs
from PyPDF2 import PdfFileWriter, PdfFileReader, PdfFileMerger;
from reportlab.pdfgen import canvas; # PDF Editor 1
from reportlab.lib.pagesizes import letter; # PDF Editor 2
from reportlab.lib.units import inch; # PDF Editor 3

uniOCRText = 'This is a test string.';

# Create a new PDF with Reportlab
packet = io.BytesIO();
can = canvas.Canvas(packet, pagesize=letter);

textobject = can.beginText();
textobject.setTextOrigin(inch, 2.5*inch);
textobject.setFont("Times-Roman", 10);
i = 0;
for line in uniOCRText:
    i = i + 1;
    print("i = " + str(i) + " - line = " + str(line));
    textobject.textLine(line); # Error here deletes spaces!!!
textobject.setFillGray(0.4);
can.drawText(textobject);
can.save();

# Move to the beginning of the StringIO buffer
packet.seek(0);
new_pdf = PdfFileReader(packet);

# Add watermark
output = PdfFileWriter();

page = new_pdf.getPage(0);
output.addPage(page);

tempFolder = "Temp/TempPDF.pdf";
outputStream = open(tempFolder, "wb");
output.write(outputStream);
outputStream.close();

# Create a Merger PDF
merger = PdfFileMerger();
merger.append(PdfFileReader(open(pdfFileFromLoc, 'rb')));
merger.append(PdfFileReader(open(tempFolder, 'rb')));
merger.write(pdfFileDestLoc);

Solution

  • >>> for line in 'hello':
    ...     print(line)
    ... 
    h
    e
    l
    l
    o
    

    You are iterating over characters. Calling the variable line does not make the interpreter iterate over lines. You have to splitlines() and iterate over the resulting list:

    >>> for line in 'hello\nbye'.splitlines():
    ...     print(line)
    ... 
    hello
    bye