I'm doing a little program that can Google search the song that you want and print the lyrics of it. I'm using HTMLUnit with Java for this purpose. I'm searching the target text, and then click to the first google result. However, when I check the results from my browser, the page differ.
Probably my mistake is because of XPath, but I'm not sure. Because, I used both Google Chrome's XPATH viewer as well as 2 Firefox extensions.
In chrome, I right-click to the element that I want to view the XPATH of, then I right click to the anchor () from the bottom window. Then, I select Copy XPath. Then I change the appropriate "s to '.
Here's my source code so far. I wrote a random song for now.
Thank you very much.
Source code:
(I tried lot's of stuff. So, I'm sorry because of the messy source code. I didn't erase lines to show you what I've tried so far. Thank you again.)
import java.io.IOException;
import java.net.MalformedURLException;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class dsa {
public static void main(String args[]) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_3_6);
webClient.setThrowExceptionOnScriptError(false);
//webClient.setJavaScriptEnabled(false);
String address = "http://www.google.com/search?q=";
String searchString = "Metallica - Whiskey In The Jar";
//String searchString = "testtesttest";
String someString = address.concat(searchString);
String lastString = someString.concat(" site:randomlyricswebpageblabla.com");
// site:anotherrandomlyricswebpage.com
HtmlPage currentPage = webClient.getPage(lastString);
/*
HtmlTextInput searchBox = (HtmlTextInput) currentPage.getElementById("search_input");
searchBox.setTextContent("Amorphis - From The Heaven Of My Heart");
HtmlButtonInput button = (HtmlButtonInput) currentPage.getElementById("search_button");
HtmlPage newPage = button.click();
*/
//System.out.println(currentPage.asText());
//
//
//HtmlElement element = (HtmlElement)currentPage.getByXPath("//h3").get(0);
//DomNode result = element.getChildNodes().get(0);
//HtmlAnchor hede = (HtmlAnchor) element.getFirstChild();
//HtmlPage newPage = hede.click();
//HtmlElement firstGoogleResult = (HtmlElement) currentPage.getByXPath("//*[@id='rso']/li[1]/div/h3/a").get(0);
//HtmlAnchor testAnchor = (HtmlAnchor) firstGoogleResult.getFirstChild();
HtmlAnchor firstGoogleResult = (HtmlAnchor) currentPage.getByXPath("//*[@id='rso']/li[1]/div/h3/a").get(0);
HtmlPage newPage = firstGoogleResult.click();
//HtmlAnchor linkTest = (HtmlAnchor) newPage.getByXPath("//*[@id='contentdiv_left']/div/div[3]/text()[1]");
//HtmlDivision divContent = (HtmlDivision) newPage.getByXPath("\\div[contains(@class, 'contentdiv_leftbox_data')]");
//System.out.println(divContent.asText());
//System.out.print("*************\n\n\n" + newPage.asText());
System.out.println(newPage.asText());
}
}
I see
Tweet Button
Tweet
in console after the execution of the program.
So, is my XPath for the first Google Search result wrong, or I'm mistaken elsewhere?
Thank you very much.
You get the wrong data because of userAgent
.
When google gets a request, it searches its database for old searches containing this data: IP + web browser + your PC data.
I don't know what the default user agent for HTMLUnit is, but if you set it to the same version as to the one you're using, it should get the same response.
Also, I'd try searching in a proper lyrics website, not google. I don't know any american lyrics websites, but it should be easy to find.
Hope that helps!