Search code examples
phpxmldomdocumentgetelementsbytagname

How do you print nodeValue with the same tag name (at different levels) and different values?


Via the link [https://www.ncbi.nlm.nih.gov/gene/7128?report=xml&format=text][1] I retrieved an XML format. From this format I retrieved all information between Gene-commentary_headings. Therefore I used DOMDocuments and getElemenetsByTagName. Now, I am trying to retrieve the line; for example with name GeneOntology. GeneOntology is located by the 22 tag of Gene-commmentary_heading. I retrieve only the information that is located within the part of the Gene-commentary_headings.

<Gene-commentary_heading>GeneOntology</Gene-commentary_heading>.

Now, I am trying to print for example all tags with the name Other-source_anchor. For example

<Other-source_anchor>DNA binding</Other-source_anchor>

But there is also on GOA with the same tag but at higher level. I would like only to retrieve the tags at the level of DNA binding. If I use

foreach($node->getElementsByTagName('Other-source_anchor') as $subnode)

I get no result. if I change $node by $doc, I retrieve all the nodeValues with the tag. How do I make sure that I only retrieve the nodeValues of the Other-source_anchor tag at the level of DNA binding?

Bellow is the code I wrote:

$esearch_test = "https://www.ncbi.nlm.nih.gov/gene/7128?report=xml&format=text";
$result = file_get_contents($esearch_test);
$xml = simplexml_load_string($result);

$doc = new DOMDocument();
$doc = DOMDocument::loadXML($xml);
$c = 1;

foreach($doc->getElementsByTagName('Gene-commentary_heading') as $node) {
    if ($node->textContent =="GeneOntology"){
        // echo "<pre>"."$c: ".$node->textContent."</pre>";
        // echo "<pre>"."$c: ".$node->nodeName."</pre>";
        // echo "<pre>"."$c: ".$node->nodeValue."</pre>";
        foreach ($doc->getElementsByTagName('Other-source_anchor') as $subnode){
            echo "<pre>"."$c: ".$subnode->nodeName."</pre>";
            echo "<pre>"."$c: ".$subnode->nodeValue."</pre>"; 
        }
    }

    $c++; # 22: GeneOntology

} 

Part of the xml file I am using in the code above.

<Gene-commentary_heading>**GeneOntology**</Gene-commentary_heading>
      <Gene-commentary_source>
        <Other-source>
          <Other-source_pre-text>Provided by</Other-source_pre-text>
          <Other-source_anchor>GOA</Other-source_anchor>
          <Other-source_url>http://www.ebi.ac.uk/GOA/</Other-source_url>
        </Other-source>
      </Gene-commentary_source>
      <Gene-commentary_comment>
        <Gene-commentary>
          <Gene-commentary_type value="comment">254</Gene-commentary_type>
          <Gene-commentary_label>Function</Gene-commentary_label>
          <Gene-commentary_comment>
            <Gene-commentary>
              <Gene-commentary_type value="comment">254</Gene-commentary_type>
              <Gene-commentary_source>
                <Other-source>
                  <Other-source_src>
                    <Dbtag>
                      <Dbtag_db>GO</Dbtag_db>
                      <Dbtag_tag>
                        <Object-id>
                          <Object-id_id>3677</Object-id_id>
                        </Object-id>
                      </Dbtag_tag>
                    </Dbtag>
                  </Other-source_src>
                  <Other-source_anchor>DNA binding</Other-source_anchor>
                  <Other-source_post-text>evidence: IEA</Other-source_post-text>
                </Other-source>
              </Gene-commentary_source>
            </Gene-commentary>
            <Gene-commentary>

Solution

  • Function

    Function that retrieve the information for a given path in string format

    function xml_retriever($xml_link,$path){
        $result = file_get_contents($xml_link);
        $xml = simplexml_load_string($result);
        $doc = new DOMDocument();
        $doc = DOMDocument::loadXML($xml);
        $xpath = new DOMXPath($doc);
        $entries = $xpath->query($path);
        $attr = '';
        foreach($entries as $node){
            $attr .= '|'.' '.$node->nodeValue. "\r\n";
            $attr = ltrim($attr, '|');
        }
        return $attr;
    }
    

    Function test

    Simple test if the function works

    # Example query and example path
    $esearch_test = "https://www.ncbi.nlm.nih.gov/gene/7128?report=xml&format=text";
    $query = "/Entrezgene/Entrezgene_properties/Gene-commentary[3]/Gene-commentary_comment/Gene-commentary[1]/Gene-commentary_comment/Gene-commentary[*]/Gene-commentary_source/Other-source/Other-source_anchor";
    # Print the result
    echo xml_retriever($esearch_test,$query);