I have some code that takes a template PDF, creates a new PDF, overlays the new PDF over the template PDF and writes the result to a stream. All this using PDFBox 2.0.4.
The problem is that copy-pasting text from the generated PDF to a text editor results in garbage text.
This happens only for the text that was added by my code, the text in the original template still works fine. The text that I add gets added using a custom font.
How do I fix the generated PDF so that the text can be copy-pasted?
SSCCE:
public class PDFTest {
private static final String FONT = "/fonts/font.ttf";
public static void main(final String... args) throws IOException, FontFormatException {
final Overlay overlay = new Overlay();
overlay.setInputPDF(newDocument("Input text", 400));
overlay.setAllPagesOverlayPDF(newDocument("Test text", 200));
try (final PDDocument document = overlay.overlay(new HashMap<>())) {
document.save("example.pdf");
}
}
private static PDDocument newDocument(final String text, final int offsetY) throws IOException, FontFormatException {
final PDDocument document = new PDDocument();
document.addPage(insertTextInPage(document, text, offsetY));
return document;
}
private static PDPage insertTextInPage(final PDDocument document, final String text, final int offsetY) throws IOException, FontFormatException {
try (final InputStream fontStream = PDFTest.class.getResourceAsStream(FONT)) {
final PDFont normalFont = PDType0Font.load(document, fontStream);
final PDPage page = new PDPage();
try (final PDPageContentStream contentStream = new PDPageContentStream(document, page, APPEND, false)) {
addTextBlock(contentStream, normalFont, text, offsetY);
}
return page;
}
}
private static void addTextBlock(final PDPageContentStream contentStream, final PDFont font, final String text, final int offsetY)
throws IOException {
contentStream.beginText();
contentStream.setFont(font, 16);
contentStream.newLineAtOffset(20, offsetY);
contentStream.showText(text);
contentStream.endText();
}
}
This is a known issue (PDFBOX-3243), files constructed with subsetted fonts (you are using PDType0Font.load()
which is very efficient) are in an intermediate state until they get saved, which is the time when the subsetting takes place.
Solution for you: either save and reload, or save to a dummy. In Windows I changed newDocument
like this and it worked:
private static PDDocument newDocument(final String text, final int offsetY) throws IOException, FontFormatException
{
final PDDocument document = new PDDocument();
document.addPage(insertTextInPage(document, text, offsetY));
document.save("nul"); // NEW!
return document;
}