I have a function that gets a specific link from a specific website, and it works, but the problem starts when I try to use this function in a while loop. When I tried that, the links length starts to stack up for some reason.
function getLinks($link) {
$link1 = $link;
$content = file_get_contents($link1);
$content = str_replace("<", "", $content);
$content = str_replace(">", "", $content);
preg_match("~previous page.+?next page~i", $content, $match);
preg_match("~\"(/.+?)\"~i", $match[0], $match);
$link2 = "https://en.wiktionary.org".$match[1];
echo $link1."<br>";
echo $link2."<br>";
return $link2;
}
$firstLink = getLinks("https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=AUTOPILOT%0Aautopilot#mw-pages");
Result firstLink = getLinks():
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=AUTOPILOT%0Aautopilot#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=BAGSIE%0Abagsie#mw-pages
^--- See how it works fine when it's like this? Then when I put it in a while loop:
$count = 0;
while ($count < 5) {
$count++;
$firstLink = getLinks($firstLink);
}
The results comes up totally messed up, and the links started to stack up upon each other, like so:
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=AUTOPILOT%0Aautopilot#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=BAGSIE%0Abagsie#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=BAGSIE%0Abagsie#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&%3Bpagefrom=BAGSIE%0Abagsie&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&%3Bpagefrom=BAGSIE%0Abagsie&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&%3Bamp%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&%3Bamp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&%3Bamp%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&%3Bamp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
https://en.wiktionary.org/w/index.php?title=Category:English_verbs&%3Bamp%3Bamp%3Bamp%3Bamp%3Bpagefrom=BAGSIE%0Abagsie&%3Bamp%3Bamp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&%3Bamp%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&%3Bamp%3Bpagefrom=ACETIFY%0Aacetify&%3Bpagefrom=ACETIFY%0Aacetify&pagefrom=ACETIFY%0Aacetify#mw-pages
This is driving me insane, so if anyone know what I did wrong, please, please tell me. Thank you.
Regular function in while loop:
function addOne($num) {
echo $num."<br>";
$num++;
return $num;
}
$num = 0;
$count = 0;
while ($count < 5) {
$count++;
$num = addOne($num);
}
^---Works just fine
Your problem is with HTML entities. I've re-wrote the function to address that issue, repeated URLs and to make it more efficient. You call it with a depth parameter, which would in your case be your while's max.
function getLinks($linkd, $depth, $checked=array()) {
if(!is_array($linkd)) $linkd=array($linkd);
foreach($linkd as $link)
{
if(isset($checked[$link])) continue;
$link1 = $link;
$content = file_get_contents($link1);
$content = str_replace("<", "", $content);
$content = str_replace(">", "", $content);
preg_match("~previous page.+?next page~i", $content, $match);
preg_match("~\"(/.+?)\"~i", $match[0], $match);
$link2 = "https://en.wiktionary.org".$match[1];
echo $link1."<br>";
echo $link2."<br>";
$checked[$link] = true;
if($depth>0)
{
$depth--;
return getLinks(html_entity_decode($link2), $depth, $checked);
}
else
{
return $link2;
}
}
}
$firstLink = "https://en.wiktionary.org/w/index.php?title=Category:English_verbs&pagefrom=AUTOPILOT%0Aautopilot#mw-pages";
$firstLink = getLinks($firstLink, 5);