Search code examples
pdfpostscriptjbig2

Embedded JBIG2 Postscript stream not being rendered in PDF


I am learning to write Postscript by hand. I've taken a JBIG2 image from the link below (amb_1.jb2 is used in the example here: http://jbig2dec.sourceforge.net/ubc/main.html) and I have added it to to a stream inside of a PDF file.

The PDF in question is here. https://gist.github.com/brandonprry/277cbbc581be4e8eaa403a16403a6996

There are no errors opening it in any of the PDF readers I have tried, but the image isn't rendered.

What am I missing for rendering the embedded JBIG2 image stream (9 0 obj)? Using the MuPDF tool 'mutool info', it recognizes the PDF contains a JBIG2 image stream, but it still doesn't render it as far as I can tell.

./mutool info /media/psf/Home/tmp/testcases/0adcc9f8-c421-47d6-93ad-9f6efc2e360b.pdf 
/media/psf/Home/tmp/testcases/0adcc9f8-c421-47d6-93ad-9f6efc2e360b.pdf:

PDF-1.4
Info object (3 0 R):
<</CreationDate(D:20051122152833-05'00')/Creator(PdfCompressor 3.0.84)/Producer(CVISION Technologies)>>
Pages: 1

Retrieving info from pages 1-1...
Mediaboxes (1):
    1   (7 0 R):    [ 0 0 967.68 1728 ]

Fonts (3):
    1   (7 0 R):    Type1 'Helvetica' (4 0 R)
    1   (7 0 R):    Type1 'Times-Roman' (5 0 R)
    1   (7 0 R):    Type1 'Courier' (6 0 R)

Images (1):
    1   (7 0 R):    [ ASCIIHex JBIG2 ] 10x10 1bpc DevGray (9 0 R)

I've noticed this stack overflow post that notes the magic header is not supposed to be included, which I currently am in the above example.

jbig2 data in pdf is not valid jbig2 data. Wrong magic

With or without the 8-byte header in the JBIG2 stream, no errors are printed and no image is rendered.

Any thoughts are much appreciated.


Solution

  • Just to take things a bit further. Your Page object lacks a Contents entry. From PDF ISO 32000 Table 30 Entries in a page object:

    Contents | stream or array (Optional) | A content stream (see 7.8.2, "Content Streams") that shall describe the contents of this page. If this entry is absent, the page shall be empty.

    This explains why the document is rendering to an empty page. The Contents contains the instructions to actually render the page, as described in Chapter 8 - Graphics

    At a minimum, the contents stream is likely to contain two instructions:

    A cm (Concat Matrix) command to do any scaling or translation of the image. By default it will be positioned at 0, 0 (bottom left) and the image will be displayed unscaled.

    A Do command to actually output the image.

    Here's a sample Content stream that translates to (x,y) = (50, 100), then outputs the image.

    10 0 obj <<
      /Length 25
    >> stream
    1 0 0 1 50 100 cm
    /I0 Do
    endstream
    endobj
    

    (/Length is the length of the content stream).

    This needs to be added as a /Contents entry to the existing Page object 0 7 R.

    7 0 obj <<
      /Type /Page
      /Contents 10 0 R
      /MediaBox [ 0 0 967.68 1728 ]
      /Parent 1 0 R
      /Resources 8 0 R
    >>
    endobj
    

    You'll also, of course, need to adjust the xref and trailer dictionaries, in the PDF, to accommodate changes offsets and 10 0 R (Contents stream) as a new object.

    After making the above changes, I'm getting the following errors from xpdf:

    Syntax Error (1224): Unknown segment type in JBIG2 stream
    Syntax Error (34044): Unexpected EOF in JBIG2 stream
    

    There's still something wrong with the data in the JBIG2 stream that you need to work on.