Search code examples
javahttp-status-code-403

403 Forbidden with Java but not web browser?


I am writing a small Java program to get the amount of results for a given Google search term. For some reason, in Java I am getting a 403 Forbidden but I am getting the right results in web browsers. Code:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;


public class DataGetter {

    public static void main(String[] args) throws IOException {
        getResultAmount("test");
    }

    private static int getResultAmount(String query) throws IOException {
        BufferedReader r = new BufferedReader(new InputStreamReader(new URL("https://www.google.com/search?q=" + query).openConnection()
                .getInputStream()));
        String line;
        String src = "";
        while ((line = r.readLine()) != null) {
            src += line;
        }
        System.out.println(src);
        return 1;
    }

}

And the error:

Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: https://www.google.com/search?q=test
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
    at DataGetter.getResultAmount(DataGetter.java:15)
    at DataGetter.main(DataGetter.java:10)

Why is it doing this?


Solution

  • You just need to set user agent header for it to work:

    URLConnection connection = new URL("https://www.google.com/search?q=" + query).openConnection();
    connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
    connection.connect();
    
    BufferedReader r  = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));
    
    StringBuilder sb = new StringBuilder();
    String line;
    while ((line = r.readLine()) != null) {
        sb.append(line);
    }
    System.out.println(sb.toString());
    

    The SSL was transparently handled for you as could be seen from your exception stacktrace.

    Getting the result amount is not really this simple though, after this you have to fake that you're a browser by fetching the cookie and parsing the redirect token link.

    String cookie = connection.getHeaderField( "Set-Cookie").split(";")[0];
    Pattern pattern = Pattern.compile("content=\\\"0;url=(.*?)\\\"");
    Matcher m = pattern.matcher(response);
    if( m.find() ) {
        String url = m.group(1);
        connection = new URL(url).openConnection();
        connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
        connection.setRequestProperty("Cookie", cookie );
        connection.connect();
        r  = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));
        sb = new StringBuilder();
        while ((line = r.readLine()) != null) {
            sb.append(line);
        }
        response = sb.toString();
        pattern = Pattern.compile("<div id=\"resultStats\">About ([0-9,]+) results</div>");
        m = pattern.matcher(response);
        if( m.find() ) {
            long amount = Long.parseLong(m.group(1).replaceAll(",", ""));
            return amount;
        }
    
    }
    

    Running the full code I get 2930000000L as a result.