Search code examples
perlxml-parsingmediawiki-api

XML::Simple not grabbing individual XML nodes


I'm using the MediaWiki API to get search results. I simply want to grab the URL to the first result, the XML element marked 'Url'. There will eventually be other things I will want to do with the XML, but I suppose in getting an answer for this I will realize what I'm doing wrong and be able to do the other stuff. Here's the page I'm working with.

require HTTP::Request;
require LWP::UserAgent;
require XML::Simple;

my $url = URI->new("http://en.wikipedia.org/w/api.php?action=opensearch&search=rooney&limit=10&namespace=0&format=xml");
my $request = HTTP::Request->new(GET => $url);
my $ua = LWP::UserAgent->new;
my $response = $ua->request($request);

my $xml = XML::Simple->new(); 
my $data = $xml->XMLin($response->content);

Everything up to here seems to work fine. My HTTP request goes through alright (if I just print $response->content it returns the XML content fine and if I print $data, I am told that it is a hash.

In attempt to get the 'Url' element, I have tried numerous approaches based on the searching I've done. A few below:

print $data->{'Url'};
print $data->{Url};
print $data{Url}

Solution

  • Pro tip: use Data::Dumper to look inside your data structure.

    use Data::Dumper;
    print Dumper($data);
    

    You'll get something like this ...

    $VAR1 = {
      'xmlns' => 'http://opensearch.org/searchsuggest2',
      'Section' => {
        'Item' => [
          {
            'Url' => {
              'content' => 'http://en.wikipedia.org/wiki/Rooney',
              'xml:space' => 'preserve'
            },
            'Description' => {
              'content' => 'Rooney may refer to:',
              'xml:space' => 'preserve'
            },
            'Text' => {
              'content' => 'Rooney',
              'xml:space' => 'preserve'
            }
          },
    ... much much more ...
    

    from which you can deduce that the route to your desired data is through

    $data->{Section}{Item}[0]{Url}{content}
    

    You should also look into using something like XML::XPath, which makes it much easier to conduct this kind of search.