Search code examples
xmlperlxml-twig

Navigating XML to access CDATA using XML::TWIG


I have this XML file and I need to access specific nodes one at a time. Below is a sample of my XML along with my sample code.

My code is working fine except that I loop through all of the Message/Content tags instead of just getting the specific Message/Content tag under the current Message tag. For example, I would get back 3 Message/Content tags when the current Message tag is being processed (the one with refid="123991123") when I only want 1 returned (). Hope this is making sense. Any help here would be appreciated.

Code:

my $twig = XML::Twig->new(
twig_handlers => {
    Selection => sub {
        foreach my $message ($_->findnodes('./Contents/Message')) {

            if($message->att('custom')){
                $Message_custom = $message->att('custom');
                foreach my $Content ($_->findnodes('./Contents/Message/Content')) {
                    print $Selection_id.": ".$Message_refid.": ".$TotalContents++."\n";
                    if($Content->att('language') eq "en"){
                        if($Content->att('imagelibraryid')){
                            $Message_Content_language_en_imagelibraryid = $Content->att('imagelibraryid');
                        }else{
                            $Message_Content_language_en = substr($message->field('Content'), 0, 20);
                        }
                    }
                }
            }
        }
    },
}
);

XML:

<?xml version="1.0" encoding="UTF-8"?>
<Root>
  <Selection id="54008473">
    <Name>Master</Name>
    <Contents>
      <Message refid="125796458" suppress="true" status="Unchanged"/>
      <Message refid="123991123" suppress="true" status="Unchanged">
        <Content language="en" imagelibraryid="5492396"/>
      </Message>
      <Message refid="128054778" custom="true" status="New">
        <Content language="en"><![CDATA[<p>Some English content</p>]]></Content>
        <Content language="fr"><![CDATA[<p>Some French content</p>]]></Content>
      </Message>
    </Contents>
  </Selection>
  <Selection id="54008475" datavaluerefid="54008479">
    <Name>RMBC</Name>
    <Contents>
      <Message refid="125796458" sameasparent="true" parentrefid="54008473" status="Unchanged"/>
      <Message refid="123991123" sameasparent="true" parentrefid="54008473" status="Unchanged"/>
      <Message refid="128054778" custom="true" status="New">
        <Content language="en"><![CDATA[<p>ada</p>]]></Content>
      </Message>
    </Contents>
  </Selection>
</Root>

Solution

  • Here is a first attempt to try to understand what your code is supposed to do, based on the structure of the XML:

    • handler for Selection nodes looks for children Content nodes with attribute language == 'en' under Message nodes under Contentnodes
      • translates to XPath ./Contents/Message/Content[@language='en']
      • if it has an attribute imagelibraryid, store the value of that
      • otherwise store the CDATA content of the first child
      • set refid to the attribute value from parent Message node
    • append them to the content list for the Selection node
    • to show what was collected, use Data::Dumper on the array ref
    #!/usr/bin/perl
    use warnings;
    use strict;
    
    use XML::Twig;
    use Data::Dumper;
    
    my %selections;
    
    my $twig = XML::Twig->new(
        twig_handlers => {
            Selection => sub {
                #$_->print();
                print "selection id: ", $_->att('id'), "\n";
    
                my @contents;
                foreach my $content ($_->findnodes("./Contents/Message/Content[\@language='en']")) {
                    my $result = {
                        refid => $content->parent->att('refid'),
                    };
                    my $id     = $content->att('imagelibraryid');
                    if (defined $id) {
                        $result->{library} = $id;
                    } else {
                        $result->{cata}    = $content->first_child->cdata;
                    }
                    push(@contents, $result);
                }
    
                # store collected Content nodes under selection ID
                $selections{ $_->att('id') } = \@contents;
            },
        }
    );
    
    $twig->parse(\*DATA);
    
    while (my($id, $contents) = each %selections) {
        my $dump = Dumper($contents);
        print "Selection '${id}' messages: $dump\n";
    }
    
    exit 0;
    
    __DATA__
    <?xml version="1.0" encoding="UTF-8"?>
    ... the rest of your XML left out ...
    

    Test run:

    $ perl dummy.pl
    selection id: 54008473
    selection id: 54008475
    Selection '54008473' messages: $VAR1 = [
              {
                'refid' => '123991123',
                'library' => '5492396'
              },
              {
                'cata' => '<p>Some English content</p>',
                'refid' => '128054778'
              }
            ];
    
    Selection '54008475' messages: $VAR1 = [
              {
                'cata' => '<p>ada</p>',
                'refid' => '128054778'
              }
            ];