I am using an XML schema generator to create a schema for a population of XML documents that are produced by a certain system. I use this generated schema to generate Java code to read and write the XML. I make certain manual changes to the generated schema so that the generated Java code is more useful to me. For example, I add enumerations.
From time to time the system is updated and the format of the XML it produces changes. At these times, I need to re-generate the schema. Since I have made manual changes to the schema I previously produced, I need to compare my existing, hand-tuned schema, to the new generated schema, and manually move relevant changes over, to avoid clobbering my manual changes. I use Beyond Compare to do this.
The difficulty I have been having is that the structure of the generated schema is extremely contingent on certain characteristics of the XML documents I'm using to generate the schema -- characteristics that don't matter for my use case.
Every generated complex type has as its immediate child a sequence. That is fine. Sometimes those sequences have choice or sequence children in addition to the element children. By putting these constructs in like this, the generator is trying to make sure that if certain elements never appeared together in the source documents, they won't be allowed to appear together in the validated documents, and that if certain elements always appear together in the source documents, they must always appear together in the generated documents.
I don't care about these things. I would rather have a simpler, more permissive schema where sequences don't have choice or sequence children unless they are really necessary.
I have tried to write XSLT to do so but it isn't working.
<xsl:template match="choice/sequence">
<xsl:for-each select="*">
<xsl:copy>
<xsl:variable name="att" as="attribute() *">
<xsl:attribute name="minOccurs">0</xsl:attribute>
</xsl:variable>
<xsl:apply-templates select="(@*[not(local-name() eq 'minOccurs')], $att)">
<xsl:sort select="local-name()"/>
</xsl:apply-templates>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:for-each>
</xsl:template>
<xsl:template match="sequence/choice[not(@*)]">
<xsl:for-each select="*">
<xsl:copy>
<xsl:variable name="att" as="attribute() *">
<xsl:attribute name="minOccurs">0</xsl:attribute>
</xsl:variable>
<xsl:apply-templates select="(@*[not(local-name() eq 'minOccurs')], $att)">
<xsl:sort select="local-name()"/>
</xsl:apply-templates>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:for-each>
</xsl:template>
<xsl:template match="sequence/sequence[@minOccurs eq '0']">
<xsl:for-each select="*">
<xsl:copy>
<xsl:variable name="att" as="attribute() *">
<xsl:attribute name="minOccurs">0</xsl:attribute>
</xsl:variable>
<xsl:apply-templates select="(@*[not(local-name() eq 'minOccurs')], $att)">
<xsl:sort select="local-name()"/>
</xsl:apply-templates>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:for-each>
</xsl:template>
What this is trying to do is: Where I have choice/sequence, remove the sequence so that its children become the children of choice instead (promoting them from grandchildren to children). To make the resulting schema validate all of the same documents as the original, it also must ensure that @minOccurs=0 on all the promoted elements.
It applies this same logic to sequence/choice, where the choice has no attributes. This is intended to exclude such choices that have @maxOccurs=unbounded, for which this transformation would be illegal.
It also applies it to sequence/sequence, where the child sequence has @minOccurs=0, same reason.
This works, except in one case in my generated schema I have a construct like this:
<xs:element name="blah1">
<xs:complexType>
<xs:sequence>
<xs:element ref="blah2"/>
<xs:choice>
<xs:element ref="blah3"/>
<xs:sequence>
<xs:element ref="blah4"/>
<xs:element ref="blah5"/>
</xs:sequence>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>
What I get out of this is:
<xs:element name="blah1">
<xs:complexType>
<xs:sequence>
<xs:element ref="blah2"/>
<xs:element minOccurs="0" ref="blah3"/>
<xs:sequence minOccurs="0">
<xs:element ref="blah4"/>
<xs:element ref="blah5"/>
</xs:sequence>
</xs:sequence>
</xs:complexType>
</xs:element>
It correctly eliminated the choice. But it didn't then eliminate the inner sequence, which I absolutely want it to do.
I am not an XSLT expert, but I thought that by using apply-templates in my own templates it would recursively apply the templates and that it should handle this case.
Try to use xsl:apply-templates
directly on your child nodes i.e. instead of stuff like
<xsl:template match="sequence/choice[not(@*)]">
<xsl:for-each select="*">
<xsl:copy>
use
<xsl:template match="sequence/choice[not(@*)]">
<xsl:apply-templates/>
</xsl:template>
You might need to use a mode to ensure you can fill in your minOccurs attribute, or perhaps you can write a template matching sequence/choice
with the right predicate in the default mode that adds the attribute.
Currently, with the use of for-each
, you are breaking part of the intent to recurse.