Search code examples
phpxpathdomxpath

Parsing A Table, Can't get more than 3 row Using DOMXpath


For some wierd reason that I can't understand right now I can't fetch more than 3 row from an table in a page

This is the page.

http://www.reedmfgco.com/en/products/cutters-and-cutter-wheels/cutter-wheels/cutter-wheels-for-tubing-cutters-plastic/

I want to parse the table at the bottom.

Since there is only one table in the page I made my Xpath really simple.$xpath -> query('//tr')

If I do the following

echo $xpath -> query('//tr')->lenght;

I get 3

Why Am i getting 3 there is 9 row there, I should get 9.


Edit This is the code I Use

$Dom = new DOMDocument();
@$Dom -> loadHTML($this->html);
$xpath = new DOMXPath($Dom);
echo $xpath -> query('//tr')->lenght;

And please note that $this->html is the raw html from the previous link in my post.


Solution

  • HTML source on this page is not valid for XML. If you open the source code of the page and will look for a tag <tr>, it also has 3 elements. Table row products do not have opening tag <tr>

    For this problem, you can use regular expressions to normalize the contents of the table.

    $html = file_get_contents('http://www.reedmfgco.com/en/products/cutters-and-cutter-wheels/cutter-wheels/cutter-wheels-for-tubing-cutters-plastic/');
    
    preg_match('`<tbody>(.*)<\/tbody>`', $html, $matches);
    if (!empty($matches)) {
        $tableBody = str_replace('</tr><td', '</tr><tr><td', $matches[1]);
    }