I'm trying to parse data from Archive.org's search functionality. The data looks like this:
<doc>
<float name="avg_rating">5.0</float>
<arr name="collection"><str>U-Melt</str><str>etree</str></arr>
<arr name="format"><str>Checksums</str><str>Flac</str><str>Flac FingerPrint</str>
<str>Metadata</str><str>Ogg Vorbis</str><str>Text</str><str>VBR M3U</str>
<str>VBR MP3</str><str>VBR ZIP</str></arr>
<str name="identifier">umelt2009-09-19.main.km184.flac16</str>
<str name="mediatype">etree</str>
<int name="num_reviews">1</int>
</doc>
Here's a link to the full XML.
PHP's SimpleXML picks up fine getting to each doc, and can read the items labeled str and arr just fine. It's the items labeled float, int or long that it freaks out on, and I can't figure out why.
My parsing code is as follows:
/* OPENING FILE */
$xml = simplexml_load_file($pathname.$identifier_list);
//Check the file to make sure it's got XML in it
$xmlCheck = file_get_contents($pathname.$identifier_list);
$xmlCheck = substr($xmlCheck,0,4);
if (!$xmlCheck == "<?xm") {
die("<p>WARNING: ".$filename." doesn't looks like XML, quitting. Check it to see what's wrong.");
}
else {
$result = $xml->result;
echo "<br/><br/>".$result['name']."<br/>";
$counter = 1;
foreach ($result->doc as $doc) {
echo "<br/><b>Document ".$counter."</b>";
$counter++;
foreach ($doc->children() as $item) {
echo $item->getName();
switch ((string) $item['name']) {
case 'identifier':
echo "<br/>Identifier: ".$item."\n";
break;
case 'licenseurl':
echo "<br/>License URL: ".$item."\n";
break;
case 'mediatype':
echo "<br/>Mediatype: ".$item."\n";
break;
case 'downloads':
echo "<br/>Downloads: ".$item."\n";
break;
case 'avg_rating':
echo "<br/>Average Rating: ".$item."\n";
break;
case 'collection':
echo "<br/>Collection: ".$item."\n";
break;
}
}
echo "<br/>";
}
}
I've tried using ->children(), ->doc and ->long or ->int. None of these seem to pick up the long/int/float items. I'm beginning to think that it's because they're primitives, but I don't know how to fix this issue.
Thanks in advance for your help.
Taking a look at that XML data (the search.xml you linked to), I don't seem to have a problem.
For instance, if I do this :
$xml = simplexml_load_file('search.xml');
foreach ($xml->result->doc as $doc) {
var_dump($doc);
}
I have several outputs, each looking like this :
object(SimpleXMLElement)[3]
public 'float' => string '0.0' (length=3)
public 'arr' =>
array
0 =>
object(SimpleXMLElement)[5]
public '@attributes' =>
array
'name' => string 'collection' (length=10)
public 'str' =>
array
0 => string 'sijis' (length=5)
1 => string 'netlabels' (length=9)
2 => string 'netlabels' (length=9)
1 =>
object(SimpleXMLElement)[6]
public '@attributes' =>
array
'name' => string 'format' (length=6)
public 'str' =>
array
0 => string '256Kbps MP3' (length=11)
1 => string 'Text' (length=4)
public 'long' => string '4721' (length=4)
public 'str' =>
array
0 => string 'sijis_SI8' (length=9)
1 => string 'http://creativecommons.org/licenses/by-nc-sa/2.0/' (length=49)
2 => string 'audio' (length=5)
public 'int' => string '0' (length=1)
(I'm using Xdebug, which gives me nice var_dump
s)
This shows that 'int
', 'long
', and equivalents are immediate children of the $doc
, used in the loop ; which means you can use something like this :
$xml = simplexml_load_file('search.xml');
foreach ($xml->result->doc as $doc) {
echo $doc->long . ' ; ' . $doc->float . '<br />';
}
To get to the 'long
' and 'float
' data ; which gives that kind of ouput, for the first documents :
4721 ; 0.0
;
2206 ; 0.0
1239 ; 3.5
Does this help you ?
Actually, your code seems to work quite OK for me ; if I remove the "echo $item->getName();
" line, to get a clearer output, I get, for the first document :
Document 1
Average Rating: 0.0
Collection:
Downloads: 4721
Identifier: sijis_SI8
License URL: http://creativecommons.org/licenses/by-nc-sa/2.0/
Mediatype: audio
Which seems OK, when looking at the XML ?
For instance, the downloads count seems OK ?