Search code examples
xmlperllibxml2xml-libxml

Parsing XML with Perl


Total noob here so I am sorry for my ignorance in advance.

Most of what I have searched and messed around with has centered around using XML::LibXML with XPath.

The problem that I have is that I am not looking to capture text between tags: I need values of the tags.

This is my XML structure

<users>
  <entry name="asd">
    <permissions>
      <role-based>
        <superuser>yes</superuser>
      </role-based>
    </permissions>
  </entry>
  <entry name="fgh">
    <permissions>
      <role-based>
        <superuser>yes</superuser>
      </role-based>
    </permissions>
    <authentication-profile>RSA Two-Factor</authentication-profile>
  </entry>
  <entry name="jkl">
    <permissions>
      <role-based>
        <superreader>yes</superreader>
      </role-based>
    </permissions>
    <authentication-profile>RSA Two-Factor</authentication-profile>
  </entry>
</users>

I am trying to grab the name attribute (without the quotes) and also determine whether this person is a superuser or superreader.

I am stuck at not being able to do much other than print off the nodes. I need to turn this into a CSV file in the structure of username; role


Solution

  • The easiest way to extract information from XML documents with XML::LibXML is to use the find family of methods. These methods use an XPath expression to select nodes and values from the document. The following script extracts the data you need:

    use XML::LibXML;
    
    my $doc = XML::LibXML->load_xml(location => 'so.xml');
    
    for my $entry ($doc->findnodes('//entry')) {
        my $name = $entry->getAttribute('name');
        my $role = $entry->findvalue(
            'local-name(permissions/role-based/*[.="yes"])'
        );
        print("$name;$role\n");
    }   
    

    It prints

    asd;superuser
    fgh;superuser
    jkl;superreader
    

    I used the local-name XPath function to get the name of the role element.

    Note that you might want to use Text::CSV to create CSV files in a more robust way.