Search code examples
htmlperlperl-data-structures

XML document parsing


<?xml version="1.0" encoding="UTF-8"?>
<Document>
    <DataElement>
        <Serial_Start>1000</Serial_Start>
        <Serial_End>2000</Serial_End>
        <Item value="257896">
            <ComItemation>
                <Price>00</Price>
                <Sku>20</Sku>
                <Qcode>27</Qcode>
            </ComItemation>
            <ComItemation>
                <Price>80</Price>
                <Sku>20</Sku>
                <Qcode>20</Qcode>
            </ComItemation>
        </Item>
        <Item value="523698">
            <ComItemation>
                <Price>00</Price>
                <Sku>20</Sku>
                <Qcode>27</Qcode>
            </ComItemation>
            <ComItemation>
                <Price>80</Price>
                <Sku>20</Sku>
                <Qcode>20</Qcode>
            </ComItemation>
        </Item>
        <Item value="856987">
            <ComItemation>
                <Price>00</Price>
                <Sku>20</Sku>
                <Qcode>27</Qcode>
            </ComItemation>
        </Item>
    </DataElement>
    <DataElement>
        <Serial_Start></Serial_Start>
        <Serial_End></Serial_End>
        <Item value="123456">
            <ComItemation>
                <Price>00</Price>
                <Sku>20</Sku>
                <Qcode>27</Qcode>
            </ComItemation>
            <ComItemation>
                <Price>80</Price>
                <Sku>20</Sku>
                <Qcode>20</Qcode>
            </ComItemation>
        </Item>
        <Item value="123456">
            <ComItemation>
                <Price>00</Price>
                <Sku>20</Sku>
                <Qcode>27</Qcode>
            </ComItemation>
            <ComItemation>
                <Price>80</Price>
                <Sku>20</Sku>
                <Qcode>20</Qcode>
            </ComItemation>
        </Item>
        <Item value="123456">
            <ComItemation>
                <Price>00</Price>
                <Sku>20</Sku>
                <Qcode>27</Qcode>
            </ComItemation>
            <ComItemation>
                <Price>80</Price>
                <Sku>20</Sku>
                <Qcode>20</Qcode>
            </ComItemation>
        </Item>
    </DataElement>
</Document>

I'm a new bee to PERL, was trying to parse the above XML document. I require the output in the below mentioned format.

Serial Start : 1000 
Serial End   : 2000 
Item : 257896 
Price : 00 
Sku   : 20 
Qcode : 27 
Item  : 257896 
Price : 80 
Sku   : 20 
Qcode : 20

... and So on for each child node.

Sample code so far:

#!/usr/bin/env perl
use strict;
use warnings;

use XML::Simple;
use Data::Dumper;
my $xml  = new XML::Simple;
my $data = $xml->XMLin("/home/rocky/PERL/doc.xml");
print Dumper($data);

foreach my $imgrec ( @{ $data->{DataElement} } ) {
   my $Serial_Start = $imgrec->{Serial_Start};
   my $Serial_End   = $imgrec->{Serial_End};
   foreach my $imgrec1 ( @{ $data->{DataElement}->{Item} } ) {
      ## Not sure of this code
      ## Trying on this part.
   }
}

Solution

  • OK, so here's your problem:

    use XML::Simple;
    

    Don't - it only makes your life harder.

    Here's a starter for 10 using XML::Twig - it's not entirely clear what you're doing to get the output you're after, so:

    #!/usr/bin/env perl
    use strict;
    use warnings;
    
    use XML::Twig;
    
    my $twig = XML::Twig-> new -> parsefile('/home/rocky/PERL/doc.xml')
    foreach my $data_element ( $twig->findnodes('//DataElement') ) {
       print "Start:", $data_element->first_child_text('Serial_Start'), "\n";
       print "End:",   $data_element->first_child_text('Serial_End'),   "\n";
       foreach my $item ( $data_element -> children('Item') ){ 
           print "Item: ", $item -> att('value'),"\n";
           foreach my $tag ( qw ( Price Sku Qcode ) ) {
               print "$tag: ", $item -> findnodes (".//$tag", 0 ) -> text,"\n";
           }
       }
    }
    

    Note - this finds the first instance of a particular tag beneath an item - not all of them.