Search code examples
perlxml-parsingxml-libxml

Using XML::LibXML to find and replace specific portions of a CableLabs 1.0 XML file


I have tried XML Simple, but due to the fact that it just reads the XML into a hash, the output is useless when run against the DTD. Learned it the hard way.

So I have adopted XML::LibXML, the funny thing is the requirements that I found most difficult to accomplish with XML::Simple, were the easiest. However I am finding that some of easier things to do in XML::Simple are proving to be impossible (with my lack of understanding of DOM, and some confusing behaviors with XML::LibXML).

So here is a sample of the XML:

    <Metadata>
        <ADI Name="movie" />
        <App_Data Name="Something I don't care about" value="who cares" />
        <App_Data Name="Something I don't care about as well" value="who cares" />
        <App_Data Name="ChangeMe" Value="" />
    </Metadata>
    <Metadata>
        <ADI Name="photo" />
        <App_Data Name="Something I don't care about" value="who cares" />
        <App_Data Name="Something I don't care about as well" value="who cares" />
        <App_Data Name="ChangeMe" Value="" />
    </Metadata>
    <Metadata>
        <ADI Name="poster" />
        <App_Data Name="Something I don't care about" value="who cares" />
        <App_Data Name="Something I don't care about as well" value="who cares" />
        <App_Data Name="ChangeMe" Value="" />
    </Metadata>

Note: I have simplified this for the use in this post.

So basically I have to use the Name field in the <ADI> tag to confirm that I am in the correct area of the DOM to make the change to the Value attribute in the <App_Data> tag that who's Name is ChangeMe.

This is the snippet of code that I have come up with... and failed miserably.

#!/usr/bin/perl

use strict;
use XML::LibXML;

my $xml2 = XML::LibXML->new();
my $data = $xml2->parse_file("adi.xml");
my $movie;
my $photo;
my $poster;

foreach my $test ($data->findnodes('//Metadata')) {
    if ($test->findvalues('./ADI/@Name[.="movie"]')){
        $movie = 1;
        undef $photo;
        undef $poster;
    }
    elsif ($test->findvalues('./ADI/@Name[.="photo"]')){
        undef $movie;
        $photo = 1;
        undef $poster;
    }
    elsif ($test->findvalues('./ADI/@Name[.="poster"]')){
        undef $movie;
        undef $photo;
        $poster = 1;
    }
}

I don't have anything beyond this, because it doesn't work. I get an error something along the lines of

Can't locate object method "findvalues" via package "XML::LibXML::Element"

As a bonus to this question, what if I wanted to completely remove the <Metadata> (and all children) for the ones that contained photo and/or poster?


Solution

  • Give this a try for starters.

    #!/usr/bin/perl
    
    use strict;
    use XML::LibXML;
    
    my $xml2 = XML::LibXML->new();
    my $data = $xml2->parse_file("adi.xml");
    
    foreach my $test ($data->findnodes('//Metadata')) {
        if ($test->findnodes('./ADI/@Name[.="movie"]')){
            print "movie\n";
        }
        elsif ($test->findnodes('./ADI/@Name[.="photo"]')){
            print "photo\n";
        }
        elsif ($test->findnodes('./ADI/@Name[.="poster"]')){
            print "poster\n";
        }
    }
    

    There is no findvalues method. What you want to do is use findnodes, which will return to you a list of nodes matching the XPath expression. Once you have that, you can iterate over the list and extract any of the data you need, much like you're already doing for Metadata.

    Also, I'm assuming your XML file has a single root-level element. I used the modified version below to test the above code.

    <root>
       <Metadata>
            <ADI Name="movie" />
            <App_Data Name="Something I don't care about" value="who cares" />
            <App_Data Name="Something I don't care about as well" value="who cares" />
            <App_Data Name="ChangeMe" Value="" />
        </Metadata>
        <Metadata>
            <ADI Name="photo" />
            <App_Data Name="Something I don't care about" value="who cares" />
            <App_Data Name="Something I don't care about as well" value="who cares" />
            <App_Data Name="ChangeMe" Value="" />
        </Metadata>
        <Metadata>
            <ADI Name="poster" />
            <App_Data Name="Something I don't care about" value="who cares" />
            <App_Data Name="Something I don't care about as well" value="who cares" />
            <App_Data Name="ChangeMe" Value="" />
        </Metadata>
    </root>
    

    I find this cheatsheet useful for Perl's LibXML library.