Search code examples
xmlxsdmarshallingjaxb2unmarshalling

Extending or redefining XSD complexType


I try to extend the sitemap.xsd : I would like to add a new element (called 'crawl') to the tUrl complexType.

So I created a sitemap-extended.xsd (I have redefined sitemap.xsd and I have extended tUrl). :

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema 
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.sitemaps.org/schemas/sitemap/0.9" 
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
elementFormDefault="qualified">

<xsd:redefine schemaLocation="sitemap.xsd">
    <xsd:complexType name="tUrl">
        <xsd:complexContent mixed="false">
            <xsd:extension base="tUrl">
                <xsd:sequence>
                    <xsd:element name="crawl" type="xsd:string" maxOccurs="1" />
                </xsd:sequence>
            </xsd:extension>
        </xsd:complexContent>
    </xsd:complexType>
</xsd:redefine>
</xsd:schema>

That is working BUT the generated XML is not valid (for example googlebot will not valid this XML due to an unknown entity - crawl). So I think I should play with different namespaces to achieve this but I do not find any solution.

I would like to be able to marshall/unmarshall this kind of XML:

<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
 xmlns:ext="http://www.mycompany.com/schema/myns"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
   http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
   http://www.mycompany.com/schema/myns http://www.mycompany.com/schema/myns/sitemap.xsd">
   <url>
     <loc>...</loc>
     <ext:crawl>...</ext:crawl>
      ...
   </url>
  </urlset>

Any idea ? thanks


Solution

  • There are a couple of issues here.

    Your modified XSD describes XMLs that cannot be validated by sitemap.xsd since you're adding content from the http://www.sitemaps.org/schemas/sitemap/0.9 - which is not allowed.

    enter image description here

    The correct way is to move crawl into another namespace and then reference it.

    Extension XSD:

    <?xml version="1.0" encoding="utf-8" ?>
    <!-- XML Schema generated by QTAssistant/XSD Module (http://www.paschidev.com) -->
    <xsd:schema targetNamespace="http://tempuri.org/XMLSchema.xsd" xmlns="http://tempuri.org/XMLSchema.xsd" elementFormDefault="qualified" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
        <xsd:element name="crawl" type="xsd:string"/>
    </xsd:schema>
    

    Modified redefined:

    <?xml version="1.0" encoding="utf-8"?>
    <!--XML Schema generated by QTAssistant/XSR Module (http://www.paschidev.com)-->
    <xsd:schema xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:ext="http://tempuri.org/XMLSchema.xsd" attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
      <xsd:redefine schemaLocation="../Standards/sitemap/sitemap.xsd">
        <xsd:complexType name="tUrl">
          <xsd:complexContent>
            <xsd:restriction base="tUrl">
              <xsd:sequence>
                <xsd:element name="loc" type="tLoc"/>
                <xsd:element name="lastmod" type="tLastmod" minOccurs="0"/>
                <xsd:element name="changefreq" type="tChangeFreq" minOccurs="0"/>
                <xsd:element name="priority" type="tPriority" minOccurs="0"/>
                <xsd:element ref="ext:crawl"/>
            </xsd:sequence>
            </xsd:restriction>
          </xsd:complexContent>
        </xsd:complexType>
      </xsd:redefine>
      <xsd:import namespace="http://tempuri.org/XMLSchema.xsd" schemaLocation="extending-or-redefining-xsd-complextype-ext.xsd"/>   
    </xsd:schema>
    

    Now, even though this is technically correct, an entity that doesn't load the Extension XSD should fail the validation of XMLs containing ext:crawl simply because the sitemap.xsd uses the processContents="strict" for the element wildcard.

    I haven't tried, but you may provide your extension XSD's URL via a xsi:schemaLocation (I would be surprised to see google's validator following external schema location hints).

    If you plan to use the modified XSD with JAXB, and expect to now see a field in your class definition for crawl, you might be in a surprise (last time I checked, redefines through restrictions at least didn't work).

    You could still un/marshall XML with the ext:crawl, you would just have to manually do it using XmlNode instead.