Search code examples
xmlperlschemaxerces

How does (perl) Xerces validation access http schemas?


This self contained example (pathname: /root/stef/test.pl) works fine on a server A-OK, while it doesn't work on another server B-NOK.

      1 use strict;
      2 use XML::Validate::Xerces;
      3
      4 sub main {
      5     my $rsep = $/;
      6     undef $/;
      7     my $xml = <DATA>;
      8     $/ = $rsep;
      9
     10     warn "working on this xml:[\n$xml]";
     11
     12     my %options;
     13     my $validator = new XML::Validate::Xerces(%options);
     14     my $valid = $validator->validate($xml) ? '' : 'in';
     15     warn "Document is ${valid}valid\n";
     16 }
     17
     18 main();
     19
     20 __DATA__
     21 <?xml version="1.0"?>
     22 <note
     23   xmlns="https://www.w3schools.com"
     24   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     25   xsi:schemaLocation="https://www.w3schools.com http://www.w3schools.com/xml/note.xsd">
     26   <!--
     27   xsi:schemaLocation="https://www.w3schools.com file:///root/stef/note.xsd">
     28   -->
     29   <to>Tove</to>
     30   <from>Jani</from>
     31   <heading>Reminder</heading>
     32   <body>Don't forget me this weekend!</body>
     33 </note>

If I swap lines 25 and 27 (i.e. I change http://www.w3schools.com/xml/note.xsd to be file:///root/stef/note.xsd) so with the schema as a local file, then it works fine even in B-NOK.

The only difference then seems when the schema is on http.

Note that, the local file was downloaded as wget http://www.w3schools.com/xml/note.xsd so that not only I know that the content of the schema file is the same (local & remote) but also that http:80 works fine to catch stuff out there.

I didn't do anything special to have Xerces working over http:80 for server A-OK. I would then expect to do nothing on server B-NOK to have Xerces going out there and grab the schema.

I didn't find clear info if/how Xerces should be instructed to use http. Neither which mechanism it has built-in, to behave like wget to download URLs. I cannot understand then if I must add some config vars. The admin of server B-NOK told me that he didn't see anything attempting to reach http://www.w3schools.com apart when doing the manual wget. This does seem Xerces doesn't bother at all to get that URL.

Really thank you in advance for any hint.


Solution

  • From the source:

    if ($strict) {
            TRACE("Using strict validation");
            $DOMparser->setValidationScheme("$XML::Xerces::AbstractDOMParser::Val_Auto");
            $DOMparser->setIncludeIgnorableWhitespace(0);
            $DOMparser->setDoSchema(1);
            $DOMparser->setDoNamespaces(1);
            $DOMparser->setValidationSchemaFullChecking(1);
            $DOMparser->setLoadExternalDTD(1);
            $DOMparser->setExitOnFirstFatalError(1);
            $DOMparser->setValidationConstraintFatal(1);
    } else {
            TRACE("Using no validation");
            $DOMparser->setValidationScheme("$XML::Xerces::AbstractDOMParser::Val_Never");
            $DOMparser->setDoSchema(0);
            $DOMparser->setDoNamespaces(0);
            $DOMparser->setValidationSchemaFullChecking(0);
            $DOMparser->setLoadExternalDTD(0);
    }
    

    Notice the setLoadExternalDTD setting is only true in strict validation mode.

    Using the following should do the trick:

    my $validator = XML::Validate::Xerces->new( strict_validation => 1 );