Search code examples

PDFBox renderImageWithDPI produces images with missing content due to absent embedded fonts - how do I resolve this?

PDFBox renderImageWithDPI only partially renders text because of missing embedded(?) fonts.

  • Using PDFBox 2.0.28 then tried PDFBox 3.0.0-RC1

  • Created a PDDocument using Loader.loadPDF

  • Created a PDFRenderer from the PDDocument

  • Executed renderImageWithDPI(pagenum, dpi, RGBObj) on PDDocument

  • Obtained java.awt.image.BufferedImage

  • Write as jpg using javax.imageio.ImageIO

  • However, there is missing content in the images

  • Extracted 2 sample problematic pages from the pdf using PDFSam basic

  • Pg 1 which generates image 1

  • and Pg 2 which generated image 2

  • Have highlighted areas where the content is missing.

  • On executing PreflightParser.validate obtain the messages below:-

1.4 : Trailer Syntax error, /XRef cross reference streams are not allowed
5.2.2 : Forbidden field in an annotation definition, Flags of Link annotation are invalid
2.3.2 : Unexpected value for key in Graphic object definition, Unexpected 'true' value for 'Interpolate' Key
2.4.2 : Invalid Color space, The operator "k" can't be used with RGB Profile
2.4.3 : Invalid Color space, The operator "f" can't be used without Color Profile
3.1.4 : Invalid Font definition, ELWKFI+OptimaLTStd: The Charset entry is missing for the Type1 Subset
3.1.4 : Invalid Font definition, JECWGC+InsigniaLTStd: The Charset entry is missing for the Type1 Subset
3.1.4 : Invalid Font definition, PHSMMZ+OptimaLTStd-Bold: The Charset entry is missing for the Type1 Subset
3.1.4 : Invalid Font definition, EHCNNL+OptimaLTStd-Italic: The Charset entry is missing for the Type1 Subset
3.1.4 : Invalid Font definition, QBVSKF+HelveticaLTStd-Obl: The Charset entry is missing for the Type1 Subset
3.1.9 : Invalid Font definition, UBAPGG+OptimaLTStd: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, UBAPGG+OptimaLTStd: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, UBAPGG+OptimaLTStd: The FontFile can't be read
3.1.9 : Invalid Font definition, ORMCFE+HelveticaLTStd-Obl: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, ORMCFE+HelveticaLTStd-Obl: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, ORMCFE+HelveticaLTStd-Obl: The FontFile can't be read
3.1.9 : Invalid Font definition, TFEWKU+HelveticaLTStd-Roman: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, TFEWKU+HelveticaLTStd-Roman: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, TFEWKU+HelveticaLTStd-Roman: The FontFile can't be read
3.1.4 : Invalid Font definition, CRQQXS+OptimaLTStd: The Charset entry is missing for the Type1 Subset
3.1.4 : Invalid Font definition, MVVAWX+InsigniaLTStd: The Charset entry is missing for the Type1 Subset
3.1.4 : Invalid Font definition, YIWFBD+OptimaLTStd-Bold: The Charset entry is missing for the Type1 Subset
3.1.11 : Invalid Font definition, JYHLHF+OptimaLTStd: The CIDSet entry is missing for the Composite Subset
3.1.9 : Invalid Font definition, LDXBBC+OptimaLTStd: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, LDXBBC+OptimaLTStd: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, LDXBBC+OptimaLTStd: The FontFile can't be read
3.1.9 : Invalid Font definition, FSNSYC+OptimaLTStd: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, FSNSYC+OptimaLTStd: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, FSNSYC+OptimaLTStd: The FontFile can't be read
3.1.9 : Invalid Font definition, LVYKUL+InsigniaLTStd: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, LVYKUL+InsigniaLTStd: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, LVYKUL+InsigniaLTStd: The FontFile can't be read
3.1.9 : Invalid Font definition, FUYTUP+OptimaLTStd-Italic: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, FUYTUP+OptimaLTStd-Italic: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, FUYTUP+OptimaLTStd-Italic: The FontFile can't be read
3.1.9 : Invalid Font definition, GZVYQO+OptimaLTStd-Bold: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, GZVYQO+OptimaLTStd-Bold: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, GZVYQO+OptimaLTStd-Bold: The FontFile can't be read
3.1.9 : Invalid Font definition, GWNIWZ+HelveticaLTStd-Roman: mandatory CIDToGIDMap missing
3.1.11 : Invalid Font definition, GWNIWZ+HelveticaLTStd-Roman: The CIDSet entry is missing for the Composite Subset
3.2.3 : Font damaged, GWNIWZ+HelveticaLTStd-Roman: The FontFile can't be read
7.1 : Error on MetaData, Metadata is not a stream

Which also corroborate to execution warnings

May 26, 2023 12:40:01 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
WARNING: Could not read embedded OTF for font GWNIWZ+HelveticaLTStd-Roman head is mandatory
    at org.apache.fontbox.ttf.TTFParser.parseTables(
    at org.apache.fontbox.ttf.TTFParser.parse(
    at org.apache.fontbox.ttf.OTFParser.parse(
    at org.apache.fontbox.ttf.OTFParser.parse(
    at org.apache.fontbox.ttf.TTFParser.parse(
    at org.apache.fontbox.ttf.OTFParser.parse(
    at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.<init>(
    at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.<init>(
    at org.apache.pdfbox.pdmodel.font.PDFontFactory.createDescendantFont(
    at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(
    at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(
    at org.apache.pdfbox.pdmodel.PDResources.getFont(
    at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(
    at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(
    at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(
    at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(
    at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(
    at org.apache.pdfbox.rendering.PageDrawer.drawPage(
    at org.apache.pdfbox.rendering.PDFRenderer.renderImage(
    at org.apache.pdfbox.rendering.PDFRenderer.renderImage(
    at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(

Additional truncated messages

May 26, 2023 12:40:00 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
WARNING: Could not read embedded OTF for font UBAPGG+OptimaLTStd head is mandatory

May 26, 2023 12:40:01 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
WARNING: Could not read embedded OTF for font GZVYQO+OptimaLTStd-Bold head is mandatory

May 26, 2023 12:40:01 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
WARNING: Could not read embedded OTF for font FUYTUP+OptimaLTStd-Italic head is mandatory

May 26, 2023 12:40:01 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
WARNING: Could not read embedded OTF for font FSNSYC+OptimaLTStd head is mandatory

Although fallback fonts seen to be used they don't work either.

May 26, 2023 12:40:01 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 findFontOrSubstitute WARNING: Using fallback font LiberationSans for CID-keyed TrueType font GWNIWZ+HelveticaLTStd-Roman

I also see warning messages as below, unsure how to process / address.

May 26, 2023 12:40:01 PM ensureDisplayProfile WARNING: ICC profile is Perceptual, ignoring, treating as Display class

Need multiple assistance.

Question 1: How do I add a font?

  • If I try using the below, The codeblock below where I get a page and add a font before rendering doesnt have any impact.
  • Note, getDocument() and setDocument and setPdfRenderer are convenience methods in my implementation class. setPdfRenderer() contains PDFRenderer renderer = new PDFRenderer(document); and sets it to a class variable.
int position = 0;
PDPage page = getDocument().getPage(position);
PDResources resources = page.getResources();
OTFParser otfParser = new OTFParser();
OpenTypeFont otf = otfParser.parse(new File("OptimaLTStd.otf"));
PDFont font = PDType0Font.load(document, otf, false);

if (position == 0) {
} else {
   PDPage prevPage = getDocument().getPage(position - 1);
   getDocument().getPages().insertBefore(page, prevPage);
   setPdfRenderer(getDocument());           }
  • Downloaded OTF from link

Question 2: Do we have an override in pdfrender to skip glyph processing so that font related issues do not impact image generation ?


  • The problem of the missing text is caused by 0 width definitions for the fonts in the PDF, which incorrectly influences a "stretching" algorithm hen rendering. This has been fixed in ticket PDFBOX-5611 and will be in the version 2.0.29. Until then, a snapshot build will be available.