Search code examples
phpxpathpreg-match

What is the error of preg_match in Xpath? Undefined offset: 1


I tried to grab the id from Property ID : with the code below:

<?php
$getURL = file_get_contents('http://realestate.com.kh/residential-for-rent-in-phnom-penh-daun-penh-phsar-chas-2-beds-apartment-1001192296/');
$dom = new DOMDocument();
@$dom->loadHTML($getURL);
$xpath = new DOMXPath($dom);

/*echo $xpath->evaluate("normalize-space(substring-before(substring-after(//p[contains(text(),'Property ID:')][1], 'Property ID:'), '–'))");*/

$id = $xpath->evaluate('//div[contains(@class,"property-table")]')->item(0)->nodeValue;
preg_match("/Property ID :(.*)/", $id, $matches);

echo $matches[1];

But it doesn't work;

Notice: Undefined offset: 1 in W:\Xampp\htdocs\X\index.php on line 12

What is wrong? if I create sting like this

$id ="Property Details Property Type : Apartment Price $ 350 pm Building Size 72 Sqms Property ID : 1001192296";

And replace in my code it work. So what is the difference between data that created by myselt and grab from xpath? Thank in advance for help me.


Solution

  • Your preg_match() does not work because the nodeValue from the xpath you are getting is exactly this:

    Property Details
    
                                Property Type : 
                             Apartment 
    
    
                        Price
                        $ 350 pm
    
    
                    Building Size
                    72 Sqms
    
    
                    Property ID 
                     : 
                    1001192296
    

    So you have to try it like this:

    $getURL = file_get_contents('http://realestate.com.kh/residential-for-rent-in-phnom-penh-daun-penh-phsar-chas-2-beds-apartment-1001192296/');
    $dom = new DOMDocument();
    @$dom->loadHTML($getURL);
    $xpath = new DOMXPath($dom);
    
    /*echo $xpath->evaluate("normalize-space(substring-before(substring-after(//p[contains(text(),'Property ID:')][1], 'Property ID:'), '–'))");*/
    
    $id = $xpath->evaluate('//div[contains(@class,"property-table")]')->item(0)->nodeValue;
    
    $id = preg_replace('!\s+!', ' ', $id);
    
    preg_match("/Property ID :(.*)/", $id, $matches);
    
    echo $matches[1];
    

    This ( $id = preg_replace('!\s+!', ' ', $id); ) will combine all tabs, whitespaces between the words to one whitespace.

    Update: Due to the comment below, I now get the full text of the HTML with $xpath->evaluate() and try to match all property ids ( like only digit and P-digits ).

    $getURL = file_get_contents('http://realestate.com.kh/residential-for-rent-in-phnom-penh-daun-penh-phsar-chas-2-beds-apartment-1001192296/');
    
    $dom = new DOMDocument();
    @$dom->loadHTML($getURL);
    
    $xpath = new DOMXPath($dom);
    
    // this only returns the text of the whole page without html tags
    $id = $xpath->evaluate( "//html" )->item(0)->nodeValue;
    $id = preg_replace('!\s+!', ' ', $id);
    
    // not a good regex, but matches the property IDs
    preg_match_all("/Property ID( |):[ |]((\w{0,1}[-]|)\d*)/", $id, $matches);
    
    // after the changes you have to go for the matches is $matches[2]
    foreach( $matches[2] as $property_id ) {
        echo $property_id."<br>";
    }