Search code examples
javanetwork-programmingclient-sideserver-side

Why does Java properly fetch one webpage's content, but not another?


I'm trying to fetch a CSV-formatted webpage to use as a rudimentary database. The test page is at http://prog.bhstudios.org/bhmi/database/get, and browsers open it no problem. However, when I run the following code, Java throws a 403 error:

import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
import java.util.logging.Level;
import java.util.logging.Logger;

public class Main
{

    static
    {
        Logger.getGlobal().setLevel(Level.ALL);
    }

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws IOException
    {
        InputStream is = null;
        try
        {
            System.out.println("Starting...");
            URL url = new URL("http://prog.bhstudios.org/prog/bhmi/database/get/");
            URLConnection urlc = url.openConnection();
            urlc.connect();
            is = urlc.getInputStream();
            int data;
            while ((data = is.read()) != -1)
            {
                System.out.print((char)data);
            }
            System.out.println("\r\nSuccess!");
        }
        catch (IOException ex)
        {
            Logger.getGlobal().log(Level.SEVERE, ex.getMessage(), ex);
            System.out.println("\r\nFailure!");
        }
        if (is != null)
            is.close();
    }
}

Here's the console output:

Starting...
Nov 18, 2013 3:01:48 PM org.bh.mi.Main main
SEVERE: Server returned HTTP response code: 403 for URL: http://prog.bhstudios.org/prog/bhmi/database/get/
java.io.IOException: Server returned HTTP response code: 403 for URL: http://prog.bhstudios.org/prog/bhmi/database/get/
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1626)
    at org.bh.mi.Main.main(Main.java:36)
Failure!

Note that 403 means the server is on and properly accepted the request, but refuses to do anything further. Now here's the kicker: If I get, say, http://example.com, it works just fine!

How can I get my Java app to read this file from my webserver?


Solution

  • Your server for some reason is configured to forbid access when the request header

    User-Agent: Java/...
    

    is present. I was able to reproduce the problem and also got it to work by doing

    URLConnection urlc = url.openConnection();
    urlc.setRequestProperty("User-Agent", "");
    urlc.connect();