Search code examples
phpcurlweb-scrapinghtml-parsingscraper

How to scrape page element using xpath


I want to get email of element using xpath

<td>
<span id="A-1_id_1151_1997" class="">info@alexianer.com</span>
</td>

I have tried many codes and one of them is this

$html = new DOMDocument();
@$html->loadHtmlFile('http://www.deutsches-krankenhaus-verzeichnis.de/suche/Krankenhaus/260530089-00-1.1/Alexianer-Aachen-GmbH.jsf');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( '//*[@id="accordion"]/table[4]/tbody/tr[2]/td[7]' );
foreach ($nodelist as $n){
echo $n->nodeValue."\n";

if i use id then email is displaying but with TD tag its not, as page is dynamic and id changes on every page. i think the problem is with nodeValue but couldn't figure out.

Please provide any solution.


Solution

  • Examining http://www.deutsches-krankenhaus-verzeichnis.de/suche/Krankenhaus/260530089-00-1.1/Alexianer-Aachen-GmbH.jsf it seems to me you can grab the nodes you want from that with something like the following XPath expression:

    //table[*[@class="tablehead"]/td/*[text()="E-Mail"]]//tr[2]/td[7]
    

    That is, translated in prose, ”Find any table that has a child with a class attribute whose value is tablehead and which in turn has a td child which in turn has any child whose text content is “E-Mail”—and if you find such a table, get the 7th td child of the 2nd tr descendant of it.”

    If you want to get only any td that contains a specific e-mail address, you can just check that the text context of the entire node matches that particular e-mail address, and if you only want to get the first such matching node, use the [1] position predicate against the whole expression:

    (//table[*[@class="tablehead"]/td/*[text()="E-Mail"]]//tr[2]/td[7][.="info@alexianer-aachen.de"])[1]