Search code examples
perlxml-libxml

XML::LibXML::Reader return value without cdata tag


I am reading an xml file with XML::LibXML::Reader

my $reader = XML::LibXML::Reader->new(IO => $fh, load_ext_dtd => 0) or die qq(cannot read content: $!);

while ($reader->nextElement( 'item' )) {

    my $copy = $reader->copyCurrentNode(1);

    my $title = $copy->findvalue( 'title' );  

}

However, the title in the xml is inside a CDATA tag, so if I look at it, it is like

<![CDATA[Some title here]]>

I could naturally use some regex to get rid of the extra tags, but I am wondering if there is a cleaner way for XML::LibXML::reader to return the title without the cdata tags ?

I've been looking through the docs, but can't find any reference to a way to do that.


Solution

  • It's the parser's job to decode the XML for you, so findvalue already returns what you want.

    use strict;
    use warnings;
    use feature qw( say );
    
    use XML::LibXML::Reader qw( );
    
    my $xml = '<root><item><title><![CDATA[Some title here]]></title></item></root>';
    
    my $reader = XML::LibXML::Reader->new(string => $xml, load_ext_dtd => 0);
    while ($reader->nextElement( 'item' )) {
        my $copy = $reader->copyCurrentNode(1);
        my $title = $copy->findvalue( 'title' );
        say $title;       # Some title here
    }