Search code examples
pdfpdf-generationpostscript

PDF Error reading a content stream


I'm working on capturing postscript calls to show and storing the currentfont and font size to output in pdf Text objects.

PDF file
Input Postscript Program

But identify is giving me an error:

$ identify pd0.pdf
   **** Error reading a content stream. The page may be incomplete.
   **** File did not complete the page properly and may be damaged.
   **** Error reading a content stream. The page may be incomplete.
   **** File did not complete the page properly and may be damaged.
   **** Error reading a content stream. The page may be incomplete.
   **** File did not complete the page properly and may be damaged.

   **** This file had errors that were repaired or ignored.
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

pd0.pdf[0] PBM 612x792 612x792+0+0 16-bit Bilevel Gray 61KB 0.000u 0:00.000
pd0.pdf[1] PBM 612x792 612x792+0+0 16-bit Bilevel Gray 61KB 0.000u 0:00.000
pd0.pdf[2] PBM 612x792 612x792+0+0 16-bit Bilevel Gray 61KB 0.000u 0:00.000

And ghostscript's output isn't giving me the detail I need to understand the problem:

$ gsnd -dPDFDEBUG pd0.pdf
GPL Ghostscript 9.18 (2015-10-05)
Copyright (C) 2015 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
<<
/Root 1 0 R
/Size 12 >>
%Resolving: [1 0]
<<
/Type /Catalog /Pages 2 0 R
>>
endobj
%Resolving: [2 0]
<<
/Kids [
3 0 R
6 0 R
9 0 R
]
/Type /Pages /Count 3 >>
endobj
%Resolving: [3 0]
<<
/Parent 2 0 R
/Contents [
5 0 R
]
/MediaBox [
0.0 0.0 612.0 792.0 ]
/Resources <<
/Font <<
/F1 4 0 R
>>
/ProcSet [
/PDF /Text ]
>>
/Type /Page >>
endobj
%Resolving: [6 0]
<<
/Parent 2 0 R
/Contents [
8 0 R
]
/MediaBox [
0.0 0.0 612.0 792.0 ]
/Resources <<
/Font <<
/F2 7 0 R
>>
/ProcSet [
/PDF /Text ]
>>
/Type /Page >>
endobj
%Resolving: [9 0]
<<
/Parent 2 0 R
/Contents [
11 0 R
]
/MediaBox [
0.0 0.0 612.0 792.0 ]
/Resources <<
/Font <<
/F3 10 0 R
>>
/ProcSet [
/PDF /Text ]
>>
/Type /Page >>
endobj
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [2 0]
Processing pages 1 through 3.
Page 1
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [5 0]
<<
/Length 15660 >>
stream
%FilePosition: 471
endobj
BT
F1
10.0 Tf
%Resolving: [4 0]
<<
/Type /Font /SubType /Type1 /BaseFont /Palatino-Roman >>
endobj
   **** Error reading a content stream. The page may be incomplete.
   **** File did not complete the page properly and may be damaged.
Page 2
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [3 0]
%Resolving: [6 0]
%Resolving: [6 0]
%Resolving: [6 0]
%Resolving: [6 0]
%Resolving: [6 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [8 0]
<<
/Length 31667 >>
stream
%FilePosition: 16474
endobj
BT
F2
10.0 Tf
%Resolving: [7 0]
<<
/Type /Font /SubType /Type1 /BaseFont /Palatino-Roman >>
endobj
   **** Error reading a content stream. The page may be incomplete.
   **** File did not complete the page properly and may be damaged.
Page 3
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [3 0]
%Resolving: [6 0]
%Resolving: [9 0]
%Resolving: [9 0]
%Resolving: [9 0]
%Resolving: [9 0]
%Resolving: [9 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [11 0]
<<
/Length 8335 >>
stream
%FilePosition: 48487
endobj
BT
F3
10.0 Tf
%Resolving: [10 0]
<<
/Type /Font /SubType /Type1 /BaseFont /Palatino-Roman >>
endobj
   **** Error reading a content stream. The page may be incomplete.
   **** File did not complete the page properly and may be damaged.

   **** This file had errors that were repaired or ignored.
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

GS>

Can anyone help me understand what the problem is with the pdf file I'm outputting?


Solution

  • There are a number of errors in the PDF. Depending on the PDF viewer in question it is required to fix a smaller or larger subset of them to allow displaying the PDF as intended.

    page content streams

    The contents of the page content streams look like this:

    BT F1 10.0 Tf 30.0 750.0 Td (<< ) Tj ET BT F1 10.0 Tf 50.0 738.0 Td (/) Tj ET [...]
    

    The error here is in the font selection instructions:

    F1 10.0 Tf
    

    The font name operand F1 is not given as a PDF name object (recognizable by a leading slash) but as some generic literal usually reserved for instruction operators.

    (As an aside, these content stream structures are unnecessarily bloated, most individual text objects draw merely one to three glyphs and have (always identical) text font selection instructions of their own. Not an error per se but completely unnecessary)

    Furthermore, as already indicated by @usr2564301, the stream length appears to be off by 1.

    font resources

    The font resources each look like this:

    <<
      /Type /Font 
      /SubType /Type1 
      /BaseFont /Palatino-Roman 
    >>
    

    First of all there is an issue in what is there: As already indicated by @KenS the correct spelling is Subtype, not SubType.

    There is another issue in what is not there: So short font resource dictionaries up to PDF 1.7 were only allowed for standard 14 fonts and for PDF 2.0 aren't allowed at all anymore. As Palatino-Roman clearly is no standard 14 font, the resource is incomplete anyways.

    According to Table 109 — Entries in a Type 1 font dictionary in ISO 32000-2,

    • Type, Subtype, and BaseFont are universally Required,
    • FirstChar, LastChar, Widths, and FontDescriptor are Required but in PDF 1.0-1.7 Optional for the standard 14 fonts,
    • Name is Required in PDF 1.0, Optional in PDF 1.1 through 1.7, Deprecated in PDF 2.0, and
    • Encoding and ToUnicode are universally Optional.

    Depending on the PDF viewer you try requirements probably appear more relaxed but any PDF processor may justifiably reject your PDFs if you fail to meet the specification requirements.

    cross references

    @usr2564301 also mentions that many cross reference table entries (and also the reference to the start of the cross reference table itself) are off by 1.

    They indeed don't point to the object number / xref literal but to the white space before. As only white spaces have to be ignored before the number / literal, many PDF processors won't notice.