Search code examples
xmlperlxml-twig

Basic parsing of XML string with XML::Twig


I've used XML::Simple for over a decade and it's done everything I need it to, and I barely ever touch Perl any more. Though right now I need to parse an XML string to simply: get all of the elements that are children of the root, and for each get their element type, attributes, and content (I don't care if there is any nested elements, just reading the content as a string is perfect). I can do all that with XML::Simple EXCEPT I also need to keep the order, which Simple can't do when there are multiple element types.

I just installed Twig and it looks very overwhelming for something I hoped would be a quick script. It's unlikely that I'll ever use Twig again after this, is this something that Twig can do easily?


Solution

  • At a simple level - XML::Twig - traversing children:

    #!/usr/bin/perl
    
    use strict;
    use warnings; 
    
    use XML::Twig;
    
    my $twig = XML::Twig -> new -> parsefile ( 'myxml.xml' );
    
    foreach my $element ( $twig -> root -> children ) { 
        print $element -> text; #element content. 
    }
    

    Extracting element attributes is either done with:

     $element -> att('attributename');
    

    Or you can fetch a hash ref with atts:

     my $attributes = $element -> atts();
     foreach my $key ( keys %$attributes ) {
         print "$key => ", $attributes -> {$key}, "\n";
     }
    

    The thing I particularly like though, is that for XML where you've a long list of similar elements, where you're trying to process - you can define a handler - that's called each time the parser encounters and is handed that subset of XML.

    sub process_book {
         my ( $twig, $book )  = @_;
         print $book -> first_child ('title'); 
         $twig -> purge; #discard anything we've already seen. 
    }
    
    my $twig = XML::Twig -> new ( twig_handlers => { 'book' => \&process_book } ); 
    $twig -> parsefile ( 'books.xml' ); 
    

    Sample XML:

    <XML>
       <BOOK>
           <title>Elements of style</title>
           <author>Strunk and White</author>
       </BOOK>
    </XML>