Parse an item depending on previous elements

To give some context, I am parsing a DICOM file, and having difficulties using the Transfer Syntax entry to determine whether to use implicit or explicit parsing. But let me define a simplified syntax, so no dicom knowledge is required.

We have a sequence of entries, each entry has a group number , and a data part. The group number is always represented as u2, but the data can be of different types, lets say u2 or u4. The order of the entries can be arbitrary, except that all the entries with group number == 2 must be at the top. All entries with group number == 2 has a data type of u2, but subsequent data parts can differ.

And here comes the hard part: all the items with group number != 2 have data type u4 if and only if an entry exactly like this was present previously:

(group, data) == (0x0002, 0x0101)

In python for example, I would parse it like this:

def read_entries(stream):
  is_u4 = False
  while not stream.eos():
    group = stream.read_u2()
    if group != 2 and is_u4:
      data = stream.read_u4()
    else:      
      data = stream.read_u2()
    if group == 2 and data == 0x0101:
      is_u4 = True
    yield (group, data)

Is there a way to achieve this using kaitai-struct?

Solution

Short answer

Exact transcription of Python code to KS per se is impossible right now, only by plugging in the code written in imperative language.

However, thanks to your additional info, an alternative approach is available, see below for solution.

Longer answer

Kaitai Struct emphasizes stateless parsing, thus everything we parse is effectively immutable (read-only), i.e. there are variables that can change its value during the parsing. Thus propagating is_u4 between parse cycles is non-trivial thing to do. We have similar problem with MIDI running status, for example.

Some of the solutions suggested by people sometimes is recursive type definition + propagation of instances using _parent syntax (see issue #70), but:

This is currently hindered by lack of type definition for recursively defined instance values + lack of initial value seeding
This generates a linked list of values, not an array (i.e. that's not what most people are comfortable with)
This takes heavy toll on call stack (i.e. monstrously long structures would most likely fail with stack exhaustion)

Alternative approach, given additional information you've provided might be viable. Actually, it turns out that whole stream of elements can be grouped in 3 groups:

elements before (0x0002, 0x0101) - they all use u2-u2 format
the (0x0002, 0x0101) element itself - note that it is also always u2-u2 when it's encountered for the first time
elements after (0x0002, 0x0101) - they all use u2-u4 format; even if additional (0x0002, 0x0101) are encountered afterwards, per the algorithm you've provided, it should be read as u2-u4, so it's ok.

Thus it possible to get away with:

types:
  elements:
    seq:
      - id: elements_2
        type: element_2
        repeat-until: _.is_u4
      - id: elements_4
        type: element_4
        repeat: eos
  element_2:
    seq:
      - id: group
        type: u2
      - id: data
        type: u2
    instances:
      is_u4:
        value: group == 2 and data == 0x0101
  element_4:
    seq:
      - id: group
        type: u2
      - id: data
        type: u4

Please tell me if I got it right and that will work for your project?