Search code examples
phpxmlsimplexml

Parse xml with namespaces with SimpleXMLparser php


I am trying to parse a XML like this:

<?xml version="1.0" encoding="UTF-8"?>
<gml:FeatureCollection 
    xmlns:ogc="http://www.opengis.net/ogc" 
    xmlns:gml="http://www.opengis.net/gml"
    xmlns:xlink="http://www.w3.org/1999/xlink" 
    xmlns:wfs="http://www.opengis.net/wfs"
    xmlns:p="http://example.org">
    <gml:featureMember>
        <p:Point>
            <gml:pointProperty>
                <gml:Point srsName="epsg:4258">
                    <gml:pos>-3.84307585 43.46031547</gml:pos>
                </gml:Point>
                <gml:Point srsName="epsg:4258">
                    <gml:pos>-3.84299411 43.46018513</gml:pos>
                </gml:Point>
                <gml:Point srsName="epsg:4258">
                    <gml:pos>-3.84299935 43.45998723</gml:pos>
                </gml:Point>
                <!-- 
                    ... many more <gml:Point> nodes ...
                --> 
                <gml:Point srsName="epsg:4258">
                    <gml:pos>-3.84309913 43.46054546</gml:pos>
                </gml:Point>
                <gml:Point srsName="epsg:4258">
                    <gml:pos>-3.84307585 43.46031547</gml:pos>
                </gml:Point>
            </gml:pointProperty>
        </p:Point>
    </gml:featureMember>
</gml:FeatureCollection>

I want to get each of gml:pos rows to save to a DB but for the moment I am happy printing them in webpace (echo...)

$output = simplexml_load_string($output);
$xml = $output->getNamespaces(true); 
//print_r( $xml);
$xml_document = $output->children($xml["p"]);
foreach($xml_document->Point->children($xml["gml"]);
    echo $xml_point->Point[0];
echo $xml->FeatureCollection; 
}

In $output I have the complete xml, tons of coordinates in gml:point

But I am trying to get to the points using namespaces but I have to be doing something wrong because I can't print anything but Array word (even by using print_r...)


Solution

  • You should not extract the namespace URIs from the document. The namespace URI is a unique string defining the XML semantic the tag is part of. In other words the URI identifies the namespace - not the alias. Your XML is a good example for that, because it has Point elements in two different namespaces.

    • p:Point resolves to {http://example.org}Point
    • gml:Point resolves to {http://www.opengis.net/gml}Point

    The namespace prefixes like p and gml are aliases to make a document smaller and more readable. They are only valid for the element and its children. They can be redefined on any element and they are optional for elements. More important they are only valid for the document. The following three examples all resolve to {http://example.org}Point.

    • Prefix "p": <p:Point xmlns:p="http://example.org"/>
    • Prefix "example": <example:Point xmlns:example="http://example.org"/>
    • No prefix: <Point xmlns="http://example.org"/>

    To read XML you define own prefixes for the namespaces and use them with Xpath or you use the namespace aware variants of the DOM methods like getAttributeNS(). Xpath is by a long way the more elegant solution. You can use the prefixes from the document or different ones.

    SimpleXML requires you to register the namespaces on every SimpleXMLElement on that you're calling xpath(). To avoid code duplication I suggest to define an array with the namespaces and a function that registers them.

    // define an array with the namespaces you're using
    // the keys are your aliases - independent of the XML document.
    $xmlns = [
      'gml' => 'http://www.opengis.net/gml',
      'p' => 'http://example.org'
    ];
    // wrap the array in a function
    $registerNamespaces = static function(SimpleXMLElement $target) use ($xmlns) {
      foreach ($xmlns as $alias => $uri) {
        $target->registerXPathNamespace($alias, $uri);
      }
    };
    
    $element = simplexml_load_string($content);
    $registerNamespaces($element);
    
    $result = [];
    // use your prefixes to references the namespaces in Xpath
    $positions = $element->xpath('//p:Point[1]//gml:pos');
    foreach ($positions as $pos) {
      $result[] = (string)$pos;
    }
    
    var_dump($result);
    

    Output: https://eval.in/159739

    array(5) {
      [0]=>
      string(23) "-3.84307585 43.46031547"
      [1]=>
      string(23) "-3.84299411 43.46018513"
      [2]=>
      string(23) "-3.84299935 43.45998723"
      [3]=>
      string(23) "-3.84309913 43.46054546"
      [4]=>
      string(23) "-3.84307585 43.46031547"
    }