I have been writing a webcrawler program, and I am attempting to compare a previous url (for the last site visited) with a current url (the current or next site to visit). To do this I am using a strcmp
function such as this:
array_push($currentsite, $source);
if (strcmp($currentsite[2], $currentsite[3])==0){
echo "redundancy";
crawlWebsite($originalsource);
}
where current site is an array of the previous sites and the current site. I am looping through new sites each time with recursion in the larger program.
However, every time I run a strcmp
on the current site and the new site, even when the urls are identical, I get a result of -1. Does anyone know why this might consistently be happening?
Thanks.
Probably the site you are testing contains something that makes it unique, like the current time or a hidden ID to save your session or something like this.
Anyway that will result in strcmp
to not return 0. It would be bettor to have a function that gives you a percentage of equality so you can define a level above which you consider two sites as identical.