Search code examples
phpxmlsimplexml

Missing Attributes while parsing XML with simplexml_load_string


$xml = simplexml_load_string($value);
$json = json_encode($xml); // convert the XML string to JSON
$array = json_decode($json,TRUE);

Attributes are missing after converting into array.


Solution

  • The <SampleData> value as you say is encoded, the simplest way to get this back to 'normal' is to use htmlspecialchars_decode() to convert all the symbols before loading the string into SimpleXML. The code below does this and then outputs various parts of the data as an example of how to display information...

    $source = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=biosample&id=367368";
    $value = file_get_contents($source);
    $value = htmlspecialchars_decode($value);
    $xml = simplexml_load_string($value);
    // Access the DbBuild value
    echo "DbBuild=".(string)$xml->DocumentSummarySet->DbBuild.PHP_EOL;
    // The BioSample publication date attribute
    echo "BioSample publication date=".(string)$xml->DocumentSummarySet->DocumentSummary->SampleData->BioSample['publication_date'].PHP_EOL;
    // List the attributes name and value
    foreach ( $xml->DocumentSummarySet->DocumentSummary->SampleData->BioSample->Attributes->Attribute as $attribute )   {
        echo (string)$attribute['attribute_name']."=".(string)$attribute.PHP_EOL;
    }
    

    Some of the XML access looks long winded, but it's just a case of accessing the various levels of data in the document. $xml->DocumentSummarySet accesses the <DocumentSummarySet> element under the root elements. BioSample['publication_date'] is the publication_date attribute in the <BioSample> element and so on.