Search code examples
perllibxml2xml-libxml

XML::LibXML - XPath - namespace


Have such XML file - t.xml

<?xml version="1.0"?>
<ArrayOfFiles xmlns="Our.Files" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
        <File>
                <DownloadCount>1</DownloadCount>
                <Id>11</Id>
        </File>
        <File>
                <DownloadCount>2</DownloadCount>
                <Id>22</Id>
        </File>
</ArrayOfFiles>

The xmlns declaration is invalid, the xmlstarlet complains about it, e.g. using:

xmlstarlet sel -t -v "//File/Id" t.xml

prints

t.xml:2.32: xmlns: URI Our.Files is not absolute
<ArrayOfFiles xmlns="Our.Files" xmlns:i="http://www.w3.org/2001/XMLSchema-instan

Probably for the same reason I can't get work the following perl code too:

use 5.014;
use warnings;
use XML::LibXML;

my $dom = XML::LibXML->new->parse_file('t.xml');
my $res = $dom->findnodes('//File/Id');
say $_->textContent for $res->get_nodelist;

When I omit the xmlns declarations, e.g. trying to parse this modified XML file

<?xml version="1.0"?>
<ArrayOfFiles>
    <File>
        <DownloadCount>1</DownloadCount>
        <Id>11</Id>
    </File>
    <File>
        <DownloadCount>2</DownloadCount>
        <Id>22</Id>
    </File>
</ArrayOfFiles>

The above code DWIM - and prints:

11
22

The question is, how to parse the original XML file, because it is downloaded from the external site - so I must deal with it somewhat...


Solution

  • That's just a warning. When working with XML namespaces, use XML::LibXML::XPathContext:

    #!/usr/bin/perl
    use warnings;
    use strict;
    use feature qw{ say };
    
    use XML::LibXML;
    use XML::LibXML::XPathContext;
    
    
    my $dom = 'XML::LibXML'->load_xml(location => shift);
    
    my $xpc = 'XML::LibXML::XPathContext'->new($dom);
    $xpc->registerNs(o => 'Our.Files');
    
    my $res = $xpc->findnodes('//o:File/o:Id');
    say $_->textContent for $res->get_nodelist;