I have implemented a function to validate .edu domains. This is how I am doing it:
if( preg_match('/edu/', $matches[0])==FALSE )
return FALSE;
return TRUE;
Now I want to skip those urls as well that point to some documents such as .pdf and .doc.
For this, the following code should have worked but is not:
if( preg_match('/edu/', $matches[0])==FALSE || preg_match('/pdf/i', $matches[0])!=FALSE || preg_match('/doc/i', $matches[0]!=FALSE))
return FALSE;
return TRUE;
Where am I wrong in this regard? Moreover, how will I implement preg_match in such a way that it has a list of document types to check in a url string. If a certain type of document is found, it should return false. In other words, I want to provide a list (an array maybe) of various document types as $pattern to find in a url.
Note: matches[0] contains the whole url string. eg: http://www.nust.edu.pk/Documents/pdf/NNBS_Form.pdf
The code for the function:
public function validateEduDomain($url) {
// get host name from URL
preg_match('@^(?:http://)?([^/]+)@i', $url, $matches);
$host = $matches[1];
// get last two segments of host name
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
if( preg_match('/edu/', $matches[0])!=FALSE && (preg_match('/pdf/i', $matches[0])==FALSE || preg_match('/doc/i', $matches[0]==FALSE)))
return TRUE;
return FALSE;
}
I wonder why are you making everything so complicated, and also noticed you have $$matches[0] instead of $matches[0]. The regexes you want is:
if( preg_match('/^https?:\/\/[A-Za-z]+[A-Za-z0-9\.-]+\.edu/i', $matches[0]) && !preg_match('/\.(pdf)|(doc)$/i', $matches[0]) ) {
// do something here...
}