Search code examples
javaweb-crawlermime

How to check if a URL is a Doc or a web page using java


I'm buliding an application similar to a URL crawler wherein I need differentiate between a normal webpage and a pdf or img or doc. Tried all ways of MIMETYPE checks... :(


Solution

  • That will do the job:

    URL url = new URL(adress);
    URLConnection u = url.openConnection();
    String type = u.getHeaderField("Content-Type");
    return type;
    

    Returns

    text/html; charset=utf-8

    for this page.