Search code examples
xmlperlxml-twig

Parsing XML with Perl & XML::Twig - extract further nested children


Currently trying to figure out how best to process the following sample XML data:

<FOO>
    <A>1</A>
    <B>Some Stuff</B>
    <C>
      <C1>
        <C2A><![CDATA[xxx]]></C2A>
        <C2B><![CDATA[yyy]]></C2B>
      </C1>
    </C>
</FOO>

I'm currently using XML::Twig to operate on everything else, and I'd like to continue using this module to achieve my goals, which are:

extract the data from C2A and C2B, and assign those to variables. Note that there may be multiple entries for C2A and C2B, which need to be concatenated into an @array for example. However, my problem is navigating the tree downwards, for example if we were following another example I've found, this would suffice for this data:

<MOVIE_LIST>
    <MOVIE>
        <NAME>Name of the Movie</NAME>
            <MOVIE_ID>28372382</MOVIE_ID>
        <DESCRIPTIONS>
             <LONG_DESCRIPTION>This is a long description</LONG_DESCRIPTION>
             <SHORT_DESCRIPTION>short description</SHORT_DESCRIPTION>
        </DESCRIPTIONS>
        <DIRECTOR_LIST>
            <DIRECTOR>director 1</DIRECTOR>
            <DIRECTOR>director 2</DIRECTOR>
        </DIRECTOR_LIST>
    </MOVIE>
    <MOVIE>
      ...
     </MOVIE>
</MOVIE_LIST>

The solution being: @directors = $elt->first_child('DIRECTOR_LIST')->children_text('DIRECTOR');

However, my issue is that sometimes, these children don't exist (e.g. there is no C section data sent at all), which is giving me no end of grief as things like the following won't work:

@C = $elt->first_child('C')->first_child('C1')->children_text('C2');

I'm getting rather stumped at how to achieve my goals, and would gratefully appreciate any advice, simplistic answers welcomed ;-)


Solution

  • If one of the methods doesn't find a child, then it will return undef, on which you can't call a method of course.

    So you are left with 2 options:

    You can either test each step of your chained expression:

    @C =    $elt->first_child('C') 
         && $elt->first_child('C')->first_child('C1')
         && $elt->first_child('C')->first_child('C1')->children_text('C2')
         || ()
        ;
    

    or use XPath:

    @C= map { $_->text } $elt->findnodes( './C/C1/C2');
    

    The second option is probably easier to read and to maintain.