Search code examples
phpfile-get-contents

Problem with PHP Script to get Domain Age ( file_get_contents)


Hi i have a Problem in php that i dont understand,

I made a php script to get the domain age from specific domain form the waybackmachine with file_get_contents

The Domains all in an array called domains and comes from an texfield from the user.

The Script works fine but only for the first domain in the array, but for the second domain i get only strange values or nothing from the loop

but i dont know why, i see no mistake. And all domains in the array are correct.

Can anyone help me what i do wrong?

//Array with Domains
$domain = explode("\n",trim($_POST['url']));

// Print the Array for debugging
print_r($domain);



// count domains for the loop
$count = count($domain);
echo $count;

for ($i = 0; $i < $count; $i++) {

$content=file_get_contents('http://web.archive.org/cdx/search/cdx?url='.$domain[$i].'',FALSE, NULL, 1, 600);

//use the data from file_get_contents to calculate the age

preg_match('/\d+/', $content, $date); 
$startyear= substr($date[0], 0, -10);
$startmonth=  substr($date[0], 4, -8);
$actualyear= date("Y");


// calculate the year & month
$years= $actualyear- $startyear;
$month= 12-$startmonth;

//echo the Age

echo " <div style='font-size:20px;text-align:center;width:100%;height:5%;color:#25bb7f;
    font-weight: bold;'> $domain[$i]: $years Jahre und $month Monate </div>"; 

}

Solution

  • I think the problem lies in the URL decoding and encoding. The domains that you pass to 'http://web.archive.org/cdx/search/cdx?url=' have to be fully encoded. See below how to accomplish this...

    //Array with Domains
    $domain = explode("\n",trim($_POST['url']));
    
    
    # url encode all the urls/domains.
    $domain = array_map(function($domain){ return urlencode($domain); }, $domain);
    
    // Print the Array for debugging
    print_r($domain);
    
    
    
    // count domains for the loop
    $count = count($domain);
    echo $count;
    
    for ($i = 0; $i < $count; $i++) {
    
    $content=file_get_contents('http://web.archive.org/cdx/search/cdx?url='.$domain[$i].'',FALSE, NULL, 1, 600);
    
    //use the data from file_get_contents to calculate the age
    
    preg_match('/\d+/', $content, $date); 
    $startyear= substr($date[0], 0, -10);
    $startmonth=  substr($date[0], 4, -8);
    $actualyear= date("Y");
    
    
    // calculate the year & month
    $years= $actualyear- $startyear;
    $month= 12-$startmonth;
    
    //echo the Age
    
    $domainNonEncoded = htmlspecialchars(urldecode($domain[$i])); # get the decoded url
    
    echo " <div style='font-size:20px;text-align:center;width:100%;height:5%;color:#25bb7f;
        font-weight: bold;'> {$domainNonEncoded}: $years Jahre und $month Monate </div>"; 
    
    }