Search code examples
flat-filedfdl

Looping in DFDL


I am trying to convert a rally complex fixed length file into XML using DFDL and Daffodil. Each line will be responsible for one element and first element of each line will tell me what kind of element it will be. It can be Parent A or Parent B or it can be child AA or AB or BB or BA.

Where Parent A is one element ,Parent B is another and Child AA is first child of Element A.

Inside one file there are multiple Parent A and Parent B. I tried initiator tag even tried choice tag but nothing seems to be working. Can anyone please help me out.


Solution

  • It's difficult to give a complete answer without example data, but using initiators and choices is likely the right approach. There are potentially simpler schemas depending on the specific data, but a generic solution might look something like this:

    <xs:schema
      xmlns:xs="http://www.w3.org/2001/XMLSchema"
      xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/">
    
      <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
    
      <xs:annotation>
        <xs:appinfo source="http://www.ogf.org/dfdl/">
          <dfdl:format ref="GeneralFormat" lengthKind="delimited" />
        </xs:appinfo>
      </xs:annotation>
    
      <xs:element name="File">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="Record" maxOccurs="unbounded">
              <xs:complexType>
                <xs:choice dfdl:initiatedContent="yes">
                  <xs:element name="ParentA" dfdl:initiator="ParentA:">
                    <xs:complexType>
                      <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix">
                        <xs:element name="Content" type="xs:string"/>
                        <xs:element name="Record" maxOccurs="unbounded">
                          <xs:complexType>
                            <xs:choice dfdl:initiatedContent="yes">
                              <xs:element name="ChildAA"  type="xs:string" dfdl:initiator="ChildAA:" />
                              <xs:element name="ChildAB"  type="xs:string" dfdl:initiator="ChildAB:" />
                            </xs:choice>
                          </xs:complexType>
                        </xs:element>
                      </xs:sequence>
                    </xs:complexType>
                  </xs:element>
                  <xs:element name="ParentB" dfdl:initiator="ParentB:">
                    <xs:complexType>
                      <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="postfix">
                        <xs:element name="Content" type="xs:string" />
                        <xs:element name="Record" maxOccurs="unbounded">
                          <xs:complexType>
                            <xs:choice dfdl:initiatedContent="yes">
                              <xs:element name="ChildBA" type="xs:string" dfdl:initiator="ChildBA:" />
                              <xs:element name="ChildBB" type="xs:string" dfdl:initiator="ChildBB:" />
                            </xs:choice>
                          </xs:complexType>
                        </xs:element>
                      </xs:sequence>
                    </xs:complexType>
                  </xs:element>
                </xs:choice>
              </xs:complexType>
            </xs:element>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    
    </xs:schema>
    

    This schema has the following features:

    • Each File has an unbounded number of Record's.
    • Each Record is a choice of either a ParentA or ParentB element, determined by the dfdl:initiator property.
    • Each Parent element contains the Content for that Parent (i.e. the stuff following the parent initiator) followed by an unbounded number of Child Records.
    • Each Child Record is also determined by the dfdl:initator property.
    • A postfix newline separator is used to determine when Parent Content and Child content end.
    • This does not allow a ChildB elements to appear after a ParentA element and vice versa--child elements must always appear after the associated parent element. (If this restriction wasn't important, this schema could be greatly simplified).

    The above allows data like this:

    ParentA:Parent A Content
    ChildAA:Child AA Content
    ChildAB:Child AB Content
    ParentB:Parent B Content
    ChildBB:Child BB Content
    ParentA:Parent A Content
    ChildAB:Child AB Content
    

    Which would parse into an XML infoset like this:

    <File>
      <Record>
        <ParentA>
          <Content>Parent A Content</Content>
          <Record>
            <ChildAA>Child AA Content</ChildAA>
          </Record>
          <Record>
            <ChildAB>Child AB Content</ChildAB>
          </Record>
        </ParentA>
      </Record>
      <Record>
        <ParentB>
          <Content>Parent B Content</Content>
          <Record>
            <ChildBB>Child BB Content</ChildBB>
          </Record>
        </ParentB>
      </Record>
      <Record>
        <ParentA>
          <Content>Parent A Content</Content>
          <Record>
            <ChildAB>Child AB Content</ChildAB>
          </Record>
        </ParentA>
      </Record>
    </File>
    

    The above is tested with Apache Daffodil 2.2.0