I'm trying to grab the latest news from a website and include it on my own. This site uses Joomla (ugh) and the resulting content hrefs are missing the base href.
so links will hold contensite.php?blablabla
which will result in links http://www.example.com/contensite.php?blablabla
So I thought of replacing http://
with http://www.basehref.com
before echo-ing it out. but my knowledge stops here.
Which should I use: preg_replace
, str_replace
? I'm not sure.
include_once('db_connect.php');
// connect to my db
require_once('Net/URL2.php');
include_once('dom.php');
// include html_simple_dom!
$dom = file_get_html('http://www.targetsite.com');
// get the html content of a site and pass it through html simple dom !
$elem2 = $dom->find('div[class=blog]', 0);
// set the div to target for !
$uri = new Net_URL2('http://www.svvenray.nl'); // URI of the resource
$baseURI = $uri;
foreach ($elem2->find('base[href]') as $elem) {
$baseURI = $uri->resolve($elem->href);
}
foreach ($elem2->find('*[src]') as $elem) {
$elem->src = $baseURI->resolve($elem->src)->__toString();
}
foreach ($elem2->find('*[href]') as $elem) {
if (strtoupper($elem->tag) === 'BASE') continue;
$elem->href = $baseURI->resolve($elem->href)->__toString();
}
echo $elem2;
This will fix all broken links, and requires PHP PEAR Net/URL2.php