I am trying to make a script which load urls from sitemap.xml and put it into array. They it should load all pages, one by one, and after each it should print something.
<?php
set_time_limit(6000);
$urls = array();
$DomDocument = new DOMDocument();
$DomDocument->preserveWhiteSpace = false;
$DomDocument->load('sitemap.xml');
$DomNodeList = $DomDocument->getElementsByTagName('loc');
//parsovani xml, vkladani linku do pole
foreach($DomNodeList as $url) {
$urls[] = $url->nodeValue;
}
foreach ($urls as $url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
echo $url."<br />";
flush();
ob_flush();
}
?>
Still doesn't work. Loading very long time, does not print anything. I think that flush does not work.
Does somebody see the problem??
Thank you very much Filip
I would run something like this
<?php
set_time_limit(6000);
$urls = array();
$DomDocument = new DOMDocument();
$DomDocument->preserveWhiteSpace = false;
$DomDocument->load('sitemap.xml');
$DomNodeList = $DomDocument->getElementsByTagName('loc');
foreach($DomNodeList as $url) {
$urls[] = $url->nodeValue;
}
foreach ($urls as $url) {
$data = file_get_contents($url);
echo $url."<br />". $data;
}
?>
Or even better instead of 2 loops.
<?php
set_time_limit(6000);
$urls = array();
$DomDocument = new DOMDocument();
$DomDocument->preserveWhiteSpace = false;
$DomDocument->load('sitemap.xml');
$DomNodeList = $DomDocument->getElementsByTagName('loc');
foreach($DomNodeList as $url) {
$curURL = $url->nodeValue;
$urls[] = $curURL;
$data = file_get_contents($curURL);
echo $curURL."<br />". $data;
}
?>