Search code examples
phpxmlxpathsimplexml

How do parse this XML feed in PHP?


I need to parse the following ridiculously formed XML feed in PHP. It is from a real estate company who didn't think to have a 'property type' node - instead they have an 'apartment' node for apartments, a 'house' node for houses etc. My problem is that I need to extract sub nodes of these in order to get 'bedrooms', 'sub_type' and other info (this is a very stripped down version of the original).

<properties>
<property>
    <general_info>
        <id>1</id>
    </general_info>
    <apartment>
        <sub_type>1</sub_type>
        <bedrooms>2</bedrooms>
    </apartment>
</property>
<property>
    <general_info>
        <id>2</id>
    </general_info>
    <house>
        <sub_type>3</sub_type>
        <bedrooms>5</bedrooms>
    </house>
</property>
<property>
    <general_info>
        <id>3</id>
    </general_info>
    <business>
        <sub_type>6</sub_type>
        <bedrooms>0</bedrooms>
    </business>
</property>

I'm using simplexml_load_file to 'grab' the feed and performing a foreach loop over the elements. After some research it looks like xpath would help, but I can't get it to work.

Here's the basics of my code:

$xmlObject = simplexml_load_file($XML_FILE_NAME);

foreach($xmlObject->property as $property)
{

    // Get Property sub-type
    $property_sub_types = $property->xpath('//sub_type');
    foreach($property_sub_types as $sub_type)
    {
        print_r($sub_type); // printing to screen for demo purposes
    }
}

This is the output I'm getting. Correct values, but shown 3 times instead of 1.

SimpleXMLElement Object
(
    [0] => 1
)
SimpleXMLElement Object
(
    [0] => 3
)
SimpleXMLElement Object
(
    [0] => 6
)
SimpleXMLElement Object
(
    [0] => 1
)
SimpleXMLElement Object
(
    [0] => 3
)
SimpleXMLElement Object
(
    [0] => 6
)
SimpleXMLElement Object
(
    [0] => 1
)
SimpleXMLElement Object
(
    [0] => 3
)
SimpleXMLElement Object
(
    [0] => 6
)

If anyone can point me in the right direction, it'd be much appreciated. Oh, and before you ask, getting them to redo their feed is not an option.


Solution

  • Your xpath //sub_type selects all sub_type elements in th whole document.
    Change it to .//sub_type should help