Search code examples
sshkaitai-struct

Kaitai (KSY) - optional attribute


I'm trying to describe SSH protocol in Kaitai language (.ksy file). At the beginning, there is a protocol version exchange in the following format:

SSH-protoversion-softwareversion SP comments CR LF

where SP comments is optional. AFAIK, there is not way of describing attribute as fully optional, only via if condition. Does anybody know how to describe this relation in Kaitai, so that parser accepts also this format: SSH-protoversion-softwareversion CR LF?

Thanks


Solution

  • Kaitai Struct is not designed to be what you would call a grammar in its traditional meaning (i.e. something mapping to a regular language, context-free grammar, BNF, or something similar). Traditional grammars have notion of "this element being optional" or "this element can be repeated multiple times", but KS works the other way around: it's not even attempting to solve the ambiguility problem, but rather builds on a fact that all binary formats are designed to be non-ambiguous.

    So, whenever you're encountering something like "optional element" or "repeated element" without any further context, please take a pause and consider if Kaitai Struct is a right tool for the task, and is it really a binary format you're trying to parse. For example, parsing something like JSON or XML or YAML might be theoretically possible with KS, but the result will be not of much use.

    That said, in this particular case, it's perfectly possible to use Kaitai Struct, you'll just need to think on how a real-life binary parser will handle this. From my understanding, a real-life parser will read the whole line until the CR byte, and then will do a second pass at trying to interpret the contents of that line. You can model that in KS using something like that:

    seq:
      - id: line
        terminator: 0xd # CR
        type: version_line
        # ^^^ this creates a substream with all bytes up to CR byte
      - id: reserved_lf
        contents: [0xa]
    types:
      version_line:
        seq:
          - id: magic
            contents: 'SSH-'
          - id: proto_version
            type: str
            terminator: 0x2d # '-'
          - id: software_version
            type: str
            terminator: 0x20 # ' '
            eos-error: false
            # ^^^ if we don't find that space and will just hit end of stream, that's fine
          - id: comments
            type: str
            size-eos: true 
            # ^^^ if we still have some data in the stream, that's all comment
    

    If you want to get null instead of empty string for comments when they're not included, just add extra if: not _io.eof for the comments attribute.