Search code examples
phpurlfilter-var

PHP filter_var URL


For validating a URL path from user input, i'm using the PHP filter_var function. The input only contains the path (/path/path/script.php).

When validating the path, I add the host. I'm playing around a little bit, testing the input validation etc. Doing so, i notice a strange(??) behavior of the filter URL function.

Code:

$url = "http://www.domain.nl/http://www.google.nl/modules/authorize/test/normal.php";
var_dump(filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_HOST_REQUIRED)); //valid

Can someone explane why this is a valid URL? Thanks!


Solution

  • The short answer is, PHP FILTER_VALIDATE_URL checks the URL only against RFC 2396 and your URL, although weird, is valid according to said standard.

    Long answer:

    The filter you are using is declared to be compliant with RFC, so let's check that standard (RFC 2396).

    The regular expression used for parsing a URL and listed there is:

    ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
     12            3  4          5       6  7        8 9
    

    Where:

    scheme    = $2
    authority = $4
    path      = $5
    query     = $7
    fragment  = $9
    

    As we can see, the ":" character is reserved only in the context of scheme and from that point onwards ":" is fair game (this is supported by the text of the standard). For example, it is used freely in the http: scheme to denote a port. A slash can also appear in any place and nothing prohibits the URL from having a "//" somewhere in the middle. So "http://" in the middle should be valid.

    Let's look at your URL and try to match it to this regexp:

    $url = "http://www.domain.nl/http://www.google.nl/modules/authorize/test/normal.php";
    //Escaped a couple slashes to make things work, still the same regexp
    $result_rfc = preg_match('/^(([^:\/?#]+):)?(\/\/([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?/',$url);
    echo '<p>'.$result_rfc.'</p>';
    

    The test returns '1' so this url is valid. This is to be expected, as the rules don't declare urls that have something like 'http://' in the middle to be invalid as we have seen. PHP simply mirrors this behaviour with FILTER_VALIDATE_URL.

    If you want a more rigurous test, you will need to write the required code yourself. For example, you can prevent "://" from appearing more than once:

    $url = "http://www.domain.nl/http://www.google.nl/modules/authorize/test/normal.php";
    $result_rfc = preg_match('/^(([^:\/?#]+):)?(\/\/([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?/',$url);
    if (substr_count($url,'://') != 1) {
        $result_non_rfc = false;
    } else {
        $result_non_rfc = $result_rfc;
    }
    

    You can also try and adjust the regular expression itself.