Search code examples
xmlperlxml-parsing

XML::Simple is not detecting all elements


I'm trying to parse some XML in Perl using XML::Simple.

The XML follows a format of:

   <result>
    <doc>
      <field name="title">Sample Title</field>
      <field name="content">Content 1</field>
      <field name="content">Content 2</field>
      .
      .
      .
      <field name="content">Content n</field>
    </doc>
   </result>

Using XML::Simple, I attempted to parse this and print the title and all content. The problem was that only the last content item was being printed. I decided to user Dumper, and this is what it returns:

$VAR1= {
  'result'=> {  
           'doc' => [
                {
                  'field' => {                    
                                'content' => {
                                             'content' => 'Content n'
                                             },
                                'title' => {
                                             'content' => 'Sample Title'
                                           }
                                 }
                      }

Only the last content item is shown for each doc element. Is there any reason for this? What can I do to have it detect all of the content items?

Here's the code:

my $url = "http://www.testurl.com/test.xml";
my $content = get $url;
die "Couldn't get XML" unless defined $content;

my $xml = new XML::Simple;
my $xmlData = $xml->XMLin($content);
print Dumper($xmlData); 

Solution

  • Per the POD:

    Note 1: The default value for 'KeyAttr' is ['name', 'key', 'id']. If you do not want folding on input or unfolding on output you must setting this option to an empty list to disable the feature.

    It's taking "name" as an ID-type field and folding the elements together.

     perl -MXML::Simple -MData::Dumper
    
    my $raw = <<XML_SAMPLE;
     <result>
        <doc>
          <field name="title">Sample Title</field>
          <field name="content">Content 1</field>
          <field name="content">Content 2</field>
          .
          .
          .
          <field name="content">Content n</field>
        </doc>
       </result>
    XML_SAMPLE
    
    my $xml = new XML::Simple;
    my $xmlData = $xml->XMLin($raw, KeyAttr => []);
    print Dumper($xmlData);
    
    __END__
    $VAR1 = {
              'doc' => {
                       'content' => '
          .
          .
          .
          ',
                       'field' => [
                                  {
                                    'content' => 'Sample Title',
                                    'name' => 'title'
                                  },
                                  {
                                    'content' => 'Content 1',
                                    'name' => 'content'
                                  },
                                  {
                                    'content' => 'Content 2',
                                    'name' => 'content'
                                  },
                                  {
                                    'content' => 'Content n',
                                    'name' => 'content'
                                  }
                                ]
                     }
            };