i want to insert some elements in my database, but i want that $pavadinimas and %kaina be in one line, not different. Moreover it will be pretty cool if i could generate my elements in all pages from website, but then I insert more than 2 links i get error from refreshing my web that page could not load. Here is my code. Thanks for help!
<?php // example of how to modify HTML contents
include_once('simple_html_dom.php');
// Create DOM from URL or file
$html = file_get_html('https://www.varle.lt/mobilieji-telefonai/');
foreach($html->find('span[class=inner]') as $pavadinimas) {
$pavadinimas = str_replace("<span class=", " ", $pavadinimas);
$pavadinimas = str_replace("inner>", " ", $pavadinimas);
$pavadinimas = str_replace("<span>", " ", $pavadinimas);
$pavadinimas = str_replace("</span></span>", " ", $pavadinimas);
$pavadinimas = str_replace('"inner"> ', " ", $pavadinimas);
}
foreach($html->find('span[class=price]') as $kaina) {
$kaina = str_replace("Lt", " ", $kaina);
$kaina = str_replace("<span class=", " ", $kaina);
$kaina = str_replace("price", " ", $kaina);
$kaina = str_replace("</span>", " ", $kaina);
$kaina = str_replace(",<sup>99</sup>", " ", $kaina);
$kaina = str_replace(",<sup>99</sup>", " ", $kaina);
$kaina = str_replace(" ", " ", $kaina);
$kaina = str_replace('" ">', " ", $kaina);
$kaina = str_replace(" ", " ", $kaina);
$query = "insert into telefonai (pavadinimas,kaina) VALUES (?,?)";
$this->db->query($query, array($pavadinimas,$kaina));
}
?>
Proceed step by step...
Start by getting all the wanted info from one page (the 1st for example)... The idea is to:
$phones = $html->find('a[data-id]');
Now that you have the code working for one page, let's try to make it work for all pages knowing that:
Next
button, so we'll stop when this link cannot be foundSo here's a code summarizing all what we said above:
$url = "https://www.varle.lt/mobilieji-telefonai/";
// Start from the main page
$nextLink = $url;
// Loop on each next Link as long as it exsists
while ($nextLink) {
echo "<hr>nextLink: $nextLink<br>";
//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a url
$html->load_file($nextLink);
/////////////////////////////////////////////////////////////
/// Get phone blocks and extract info (also insert to db) ///
/////////////////////////////////////////////////////////////
$phones = $html->find('a[data-id]');
foreach($phones as $phone) {
// Get the link
$linkas = $phone->href;
// Get the name
$pavadinimas = $phone->find('span[class=inner]', 0)->plaintext;
// Get the name price and extract the useful part using regex
$kaina = $phone->find('span[class=price]', 0)->plaintext;
// This captures the integer part of decimal numbers: In "123,45" will capture "123"... Use @([\d,]+),?@ to capture the decimal part too
preg_match('@(\d+),?@', $kaina, $matches);
$kaina = $matches[1];
echo $pavadinimas, " #----# ", $kaina, " #----# ", $linkas, "<br>";
// INSERT INTO DB HERE
// CODE
// ...
}
/////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////////
// Extract the next link, if not found return NULL
$nextLink = ( ($temp = $html->find('div.pagination a[class="next"]', 0)) ? "https://www.varle.lt".$temp->href : NULL );
// Clear DOM object
$html->clear();
unset($html);
}
Output
nextLink: https://www.varle.lt/mobilieji-telefonai/
Samsung Phone I9300 Galaxy SIII Juodas #----# 1099 #----# https://www.varle.lt/mobilieji-telefonai/samsung-phone-i9300-galaxy-siii-juodas.html
Samsung Galaxy S2 Plus I9105 Pilkai mėlynas #----# 739 #----# https://www.varle.lt/mobilieji-telefonai/samsung-galaxy-s2-plus-i9105-pilkai-melynas.html
Samsung Phone S7562 Galaxy S Duos baltas #----# 555 #----# https://www.varle.lt/mobilieji-telefonai/samsung-phone-s7562-galaxy-s-duos-baltas--457135.html
...
nextLink: https://www.varle.lt/mobilieji-telefonai/?p=2
LG T375 Mobile Phone Black #----# 218 #----# https://www.varle.lt/mobilieji-telefonai/lg-t375-mobile-phone-black.html
Samsung S6802 Galaxy Ace Duos black #----# 579 #----# https://www.varle.lt/mobilieji-telefonai/samsung-s6802-galaxy-ace-duos-black.html
Mobilus telefonas Samsung Galaxy Ace Onyx Black | S5830 #----# 559 #----# https://www.varle.lt/mobilieji-telefonai/mobilus-telefonas-samsung-galaxy-ace-onyx-black.html
...
...
...
Notice that the code may take a while to parse all the pages, so php may return this error Fatal error: Maximum execution time of 30 seconds exceeded ...
. Then, simply extend the maximum execution time like this:
ini_set('max_execution_time', 300); //300 seconds = 5 minutes