Search code examples
phpcurlsimplexmloai

php - using SimpleXML to parse OAI PMH file


I'm trying to parse this file: http://mdc.cbuc.cat/cgi-bin/oai.exe?verb=ListRecords&metadataPrefix=oai_dc&set=afcecemc&from=2011-06-21&until=2011-06-21 using SimpleXML.

I can get all elements but those wich are inside the tag. It says that the tag is empty. Here is my code.

function getXMLfile($URL) {
    $chDyn = curl_init();
    curl_setopt ($chDyn, CURLOPT_URL, $URL);
    curl_setopt($chDyn, CURLOPT_RETURNTRANSFER, 1);
    $xml = curl_exec($chDyn);
    curl_close($chDyn);

    try {
        $xmlObj = new SimpleXMLElement($xml);
    }
    catch (Exception $e) { echo $e; }

    return $xmlObj;
}


$cdmURL = "http://mdc.cbuc.cat/cgi-bin/oai.exe?verb=ListRecords&metadataPrefix=oai_dc&set=afcecemc";

$xmlObj = getXMLfile($cdmURL);
$xmlNode = $xmlObj->ListRecords;

foreach ($xmlNode->record as $rNode) {
    var_dump($rNode->children());
}

But the output is this one:

[...]
["metadata"]=>
  object(SimpleXMLElement)#8 (0) {
}

This element is not empty! I know that the solution is somehow related with using "namespaces" but I can't figure out how to make it works.

Any help it will be appreciated! Thanks.


Solution

  • To access children with their own namespace, you have to tell SimpleXMLElement that you want children not in the default langauge. See SimpleXMLElement::children.

    The document you linked makes use of multiple namespaces so it's probably a bit confusing if you're new to it.

    The following is some example code which extends yours (and simplifies the loading a bit but I think you understand it) to access the children inside the first record element (I break the loop):

    $url = 'http://mdc.cbuc.cat/cgi-bin/oai.exe?verb=ListRecords&metadataPrefix=oai_dc&set=afcecemc&from=2011-06-21&until=2011-06-21';
    
    $xmlObj = simplexml_load_file($url);
    
    $xmlNode = $xmlObj->ListRecords;
    
    foreach ($xmlNode->record as $rNode) {
        var_dump($rNode->children());
        var_dump($rNode->metadata->children('oai_dc', 1));
        var_dump($rNode->metadata->children('oai_dc', 1)->dc->children('dc', 1));
        break;
    }
    

    This gives the following output which I guess is what you're looking for:

    object(SimpleXMLElement)#7 (2) {
      ["header"]=>
      object(SimpleXMLElement)#9 (3) {
        ["identifier"]=>
        string(29) "oai:mdc.cbuc.cat:afcecemc/521"
        ["datestamp"]=>
        string(10) "2011-06-21"
        ["setSpec"]=>
        string(8) "afcecemc"
      }
      ["metadata"]=>
      object(SimpleXMLElement)#10 (0) {
      }
    }
    object(SimpleXMLElement)#10 (1) {
      ["dc"]=>
      object(SimpleXMLElement)#8 (0) {
      }
    }
    object(SimpleXMLElement)#7 (12) {
      ["title"]=>
      string(12) "Puig d'Assas"
      ["creator"]=>
      string(26) "Gallardo i Garriga, Antoni"
      ["date"]=>
      string(19) "[Entre 1912 i 1928]"
      ["relation"]=>
      array(2) {
        [0]=>
        string(72) "Paper; gelatina i plata; positiu; blanc i negre; horitzontal; 12 x 17 cm"
        [1]=>
        string(27) "Estudi de la Masia Catalana"
      }
      ["subject"]=>
      string(9) "Muntanyes"
      ["coverage"]=>
      string(32) "Puig d'Assas ; Osona ; Catalunya"
      ["description"]=>
      array(2) {
        [0]=>
        string(2) "Bo"
        [1]=>
        string(163) "Títol atorgat pel catalogador. Informació extreta dels àlbums de l'EMC: Situació: Puig d'Assas. Facilitada per: Antoni Gallardo i Garriga. Facilitada en: 1928."
      }
      ["publisher"]=>
      string(33) "Centre Excursionista de Catalunya"
      ["source"]=>
      string(29) "Memòria Digital de Catalunya"
      ["type"]=>
      string(5) "Image"
      ["rights"]=>
      string(49) "http://creativecommons.org/licenses/by-nc-nd/3.0/"
      ["identifier"]=>
      string(35) "http://mdc.cbuc.cat/u?/afcecemc,521"
    }