Search code examples
phpweb-crawlerstrcmp

PHP strcmp question


I have been writing a webcrawler program, and I am attempting to compare a previous url (for the last site visited) with a current url (the current or next site to visit). To do this I am using a strcmp function such as this:

array_push($currentsite, $source);
  if (strcmp($currentsite[2], $currentsite[3])==0){
    echo "redundancy";
    crawlWebsite($originalsource);  
    }

where current site is an array of the previous sites and the current site. I am looping through new sites each time with recursion in the larger program.

However, every time I run a strcmp on the current site and the new site, even when the urls are identical, I get a result of -1. Does anyone know why this might consistently be happening?

Thanks.


Solution

  • Probably the site you are testing contains something that makes it unique, like the current time or a hidden ID to save your session or something like this.

    Anyway that will result in strcmp to not return 0. It would be bettor to have a function that gives you a percentage of equality so you can define a level above which you consider two sites as identical.