Search code examples
pdfgrammarbnfebnf

grammar of PDF 1.7. (BNF or variant)


I'm looking for a grammar of PDF 1.7 (BNF or variant)

absolutely not googleable


Solution

  • PDF is a binary format that is not context-free. In PDF for example you need to read and interpret the size of a binary stream before parsing the stream.

    Example:

    10 0 obj
    <</Type /XObject
    /Subtype /Image
    /Width 260
    /Height 52
    /ColorSpace /DeviceRGB
    /SMask 10 0 R
    /BitsPerComponent 8
    /Filter /FlateDecode
    /Length 4570>> stream
    --- insert binary data here ---
    endstream
    endobj
    

    There is no way to tell if your binary data will contain the tokens endstream or endobj inside, so you have no other choice than reading the length of the stream before parsing it.

    BNF can only be used for context-free grammars, so it is not possible to construct a BNF grammar for PDF.

    Take a look at the specification here: PDF Reference Document