Search code examples

php Possible weaknesses for FILTER_VALIDATE_URL + fopen($url,"r") - url validation

my urls in my site'll have international characters such as ş, ğ, ı ...

after reading lots of posts and blogs for url validation issue, I decided to support filter_var($url, FILTER_VALIDATE_URL) with another pieces of codes since

  1. if url has int. characters, url can't validated
  2. answered by bažmegakapa with +7.

then I concluded to use this idea at PHP validation/regex for URL

bažmegakapa offers to use

if (preg_match("#^https?://.+#", $link) and @fopen($link,"r")) echo "OK";

to see if link can be opened then it means it's validated.

After this point MY question:

Question - I loved this idea & it seems to me very brilliant. But after seeing that it has only +7 and that page has +>>7 answers, I want to ask that what is the idea of php masters who will be glad to answer this question to help like the ones like me; the rookies.

Is there any weaknesses in bažmegakapa's code? for example I don't know but can there be any url that fopen can't open it but actually it's a harmless, must-be-validated url? So what is the cure of the weaknesses you detected?

thank you

best regards


  • The fact, that filter_var($url, FILTER_VALIDATE_URL) considers javascript://test%0Aalert(321) valid is not a weakness. If you think it is, your expectations about what filter_var is for are wrong.

    filter_var($url, FILTER_VALIDATE_URL) validates the syntax of a URL against RFC 2396.

    • It is not meant to determine whether the resource pointed to by the URL is accesssible.

    • It is not meant to determine whether it is safe to use the URL as the value of a href attribute in an a element of a HTML document when the URL is provided by a user.

    • It is not meant to consider the scheme (which may place restrictions on URLs that go beyond what is described in RFC 2396). For example while

      • ftp://foo:bar@baz is a valid FTP URL according to RFC 1738, 3.2 FTP,
      • http://foo:bar@baz is not a valid HTTP URL according to RFC 2616, 3.2.2 http URL (even though some browsers can interpret such "URLs").

    filter_var does not bake cakes, nor does it brew coffee. If you require cake or coffee, use something else (RFC 2324 is a good start).

    Depending on the circumstances, displaying a URL wich points to a resource that your server cannot access might be a good idea or a bad idea. Depending on the circumstances, displaying a URL that does not point to HTTP or HTTPS resource might be a good idea or a bad idea. One size does not fit all.