Search code examples
phpserverexplodeends-with

php endsWith function fails on one server but works on another?


I have a script which is essentially a crawler to index news articles. The script works fine on one server (main http server), but I am trying to move it to a dedicated platform and one section will not function.

The part that fails uses a simple function (from SO) to check if a string (a url found by the crawler) matches an exclusion list stored locally in a .txt file.

I have tested to make sure the .txt file is received using a var_dump and everything shows ok.

This fails consistently to unset or echo out positives, but on the other server everything works ok.

The important part is as follows:

<?php
ini_set('display_errors', 1);
$linkurl_reg = '/href="http:\/\/metro.co.uk(.+?)"/is';    


function endsWith($haystack, $needle)
{
return $needle === "" || substr($haystack, -strlen($needle)) === $needle;
}

$data = file_get_contents("http://metro.co.uk");
preg_match_all($linkurl_reg,$data,$new_links);

$exclusion_list = explode("\n",file_get_contents('../F/exclusion_list.txt'));

var_dump($exclusion_list); //just to check we got the file ok

for($i = '0';$i < count($new_links[1]) ; $i++){
        for ($ii = '0';$ii < count($exclusion_list);$ii++){
        if(endsWith($new_links[1][$i], $exclusion_list[$ii])){echo 'unset ';unset($new_links[1][$i]);}else{echo'not unset ';}
        }
    }


?> 

The strange thing is if I only use a single value when setting the exclusion list e.g

$exclusion_list[0] = "xmlrpc.php"; 

instead of

$exclusion_list = explode("\n",file_get_contents('../F/exclusion_list.txt'));

it will work for that particular string.

Please if anybody has anyideas, I have been staring at this for 3 days now and am completely stumped.

Things I have tried:

encoding the $exclusion_list array to UTF before exploding.

encoding the $exclusion_list strings to UTF in the loop

tested the function with normal strings

writing the strings in manually rather than from the array or fileget (works annoyingly)

changing the fileextension from .txt to various other things

updating php version on the server (non working one)

replacing "\n" with "\r" and "\n\r" during explode

I have even tried changing the function to some of the others found on SO, strangely I get the same results (works with strings I define but not with anything retrieved from the exclusion_list file).

For the life of me I have no idea why one would work and not the other.

Current PHP version: 5.4.36-0+deb7u3 (non working server)

Current PHP version: 5.2.17 (working server)

requested var_dump for $exclusion list (non working server):

array(9) {
  [0]=>
  string(6) ".jpeg"
  [1]=>
  string(5) ".jpg"
  [2]=>
   string(5) ".gif"
  [3]=>
  string(5) ".css"
  [4]=>
  string(5) ".xml"
  [5]=>
  string(11) "xmlrpc.php"
  [6]=>
  string(21) "metro.co.uk" target="
  [7]=>
  string(20) "metro.co.uk/osd.xml"
  [8]=>
  string(32) "metro.co.uk/terms/#privacypolicy"
}

requested var_dump for $exclusion list (working server):

array(9) {
  [0]=>
  string(5) ".jpeg"
  [1]=>
  string(4) ".jpg"
  [2]=>
  string(4) ".gif"
  [3]=>
  string(4) ".css"
  [4]=>
  string(4) ".xml"
  [5]=>
  string(10) "xmlrpc.php"
  [6]=>
  string(20) "metro.co.uk" target="
  [7]=>
  string(19) "metro.co.uk/osd.xml"
  [8]=>
  string(32) "metro.co.uk/terms/#privacypolicy"
}

Both servers are linux, both text files are not built or edited on windows platforms


Solution

  • Make sure, the lines in your *.txt file are separated by \n not \r\n, which happens if you save in a windows program.

    Otherwise after you explode it with '\n' the strings will all end with '\r' and thus may not fullfill the endsWith() condition

    This code should work on both machines:

    $exclusion_list = explode("\n",str_replace("\r", "", file_get_contents('../F/exclusion_list.txt')));