Search code examples
javaparsinghtml-parsinggoogle-search

Page that programmatically is taken is different than normal google page?


We want to programmatically take current google page. we use many techniques with different programmatic languages but we do not achieve to get correct(current) google page.

Java code example

    public class GoogleParser {

public static void main(String[] args){
      GoogleParser googleParser = new GoogleParser();
      googleParser.execute();
}
public void execute(){
String[] params = {"ankara nüfusu"};    
     final URL url = encodeGoogleQuery(params);

       System.out.println("Downloading [" + url + "]...\n\n\n\n\n");
        try {
final String html = downloadString(url);
System.out.println(html);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private static String downloadString(final InputStream stream) throws IOException {
final ByteArrayOutputStream out = new ByteArrayOutputStream();
int ch;
while (-1 != (ch = stream.read()))
    out.write(ch);
return out.toString();
}
  private static String downloadString(final URL url) throws IOException {
       final String agent = "Mozilla/21.0 (Windows; U; Windows 7; en-US)";
       final URLConnection connection = url.openConnection();
       connection.setRequestProperty("User-Agent", agent);
       final InputStream stream = connection.getInputStream();
       return downloadString(stream);
   }

private static URL encodeGoogleQuery(final String[] args) {
        try {
            final StringBuilder localAddress = new StringBuilder();
            localAddress.append("/search?q=");

            for (int i = 0; i < args.length; i++) {
                final String encoding = URLEncoder.encode(args[i], "UTF-8");
                localAddress.append(encoding);
                if (i + 1 < args.length)
                    localAddress.append("+");
            }

            return new URL("http", "www.google.com", localAddress.toString());

        } catch (final IOException e) {
            // Errors should not occur under normal circumstances.
            throw new RuntimeException(
                    "An error occurred while encoding the query arguments.");
        }
    }
}

Java Code get this html page Google current Page

 First image Java Code Result Page
 Second image Google Current Page

Html Page that java get from google is different than current google page.

  1. Different Results
  2. Not Contains Google Now Result (4,551 milyon(2011) part)
  3. Not Contains Google Graph Result (Right Side Ankara information)
  4. Older page than current
  5. Nav properties(Web,İmages,videos) left side , normally search bar below

Do you have any idea how to get current(last) page of google with programmatically java language . However solutions of other language are important to solve problem.

Thank you for your response


Solution

  • Google is smart in a way of detecting who is sending the request:

    1. Make sure you send the same cookies as your browser does
    2. Make sure you send the same or valid browser agent string