Search code examples
javanetwork-programmingjsoupmalformedurlexception

How to check if an URL doesn't exist?


I trying to check if an URL which I want to connect to exists or not. Here's my attempt:

try {
    // Connect to the url
    document = Jsoup.connect("http://www.malformedurl.com").get();
    tags = document.select(".tags .tag a");
    num = document.select(".tag .count");
    // Take the wanted data 
    UrlFunctions.UrlParse(tags, num);
} catch (java.net.MalformedURLException e) {
    System.out.println("URL DOESNT EXIST");
}

After running that, I don't get the message URL DOESNT EXIST. What exception should I use or what else should I do?


Solution

  • A MalFormedURLException will only be thrown when the URL is really malformed, i.e. it does not conform the URL spec, not when it does not exist. This is under the covers been thrown by the constructor of the java.net.URL class. Its javadoc tells the following:

    throws

    MalformedURLException - If the string specifies an unknown protocol.

    So, it will only be thrown when you use for example "www.malformedurl.com" or "foo://www.malformedurl.com" instead of "http://www.malformedurl.com".

    To detect whether an URL exists you'd better to head for a different solution. If the host name is unknown, then you should catch UnknownHostException instead:

    try {
        document = Jsoup.connect("http://www.malformedurl.com").get();
        // ...
    } catch (UnknownHostException e) {
        System.err.println("Unknown host");
        e.printStackTrace(); // I'd rather (re)throw it though.
    }
    

    This is not necessarily a problem of the other end, it can also occur when the DNS server on your network is bogus.

    Or, to detect whether an IP address is reachable, then you should catch SocketTimeoutException instead:

    try {
        document = Jsoup.connect("http://12.34.56.78").get();
        // ...
    } catch (SocketTimeoutException e) {
        System.err.println("IP cannot be reached");
        e.printStackTrace(); // I'd rather (re)throw it though.
    }