If I have a webpage like this:
<body>
<header>
<a href='http://domain1.com'>link 1 text</a>
</header>
<a href='http://domain2.com'>link 2 text</a>
<footer>
<a href='http://domain3.com'>link 3 text</a>
</footer>
</body>
How do I pull the <a>
tags out of the <body>
but exclude the links from <header>
and <footer>
?
In the real web page, there will be a lot of <a>
tags in the <header>
so I'd rather not have to cycle through ALL of them.
I want to pull out the URLs and anchor text from each of the <a>
tags that are NOT inside the <header>
or <footer>
tags.
EDIT: this is how I find links in the header:
$header = $html->find('header',0);
foreach ($header->find('a') as $a){
do something
}
I would like to do this (note the use of "!")
$foo = $html->find('!header,!footer');
foreach ($foo->find('a') as $a){
do something
}
Remove the header and footer from the DOM you are working with before looking for the links.
<?php
include("simple_html_dom.php");
$source = <<<EOD
<body>
<header>
<a href='http://domain1.com'>link 1 text</a>
</header>
<a href='http://domain2.com'>link 2 text</a>
<a href='http://domain4.com'>link 4 text</a>
<footer>
<a href='http://domain3.com'>link 3 text</a>
</footer>
</body>
EOD;
$html = str_get_html($source);
foreach ($html->find('header, footer') as $unwanted) {
$unwanted->outertext = "";
}
$html->load($html->save());
$links = $html->find("a");
foreach ($links as $link) {
print $link;
};
?>