Search code examples
javaauthenticationhtmlunit

Getting data from a webpage with login requirement Java


So recently I decided to teach myself how to get data from webpages. I managed to get data from JSON from a different webpage but when I try to copy everything from this website, it doesn't show the data I actually need.

The page I am trying is for example: http://www.tremorgames.com/index.php?action=shop&page=2 (You might need to register). The data I am trying to get is for example game name/ price or stock, if I can get one then I will be able to get all.

The problem is that Dev tools shows the code but when I try to copy everything to a file using Java, it doesn't show most part of the code.

(I tried with Jsoup as well and it doesn't work either). This is what I have for copying from webpages:

BufferedReader reader = null;
try {
    URL url = new URL("http://www.tremorgames.com/index.php?action=shop&page=2");
    reader = new BufferedReader(new InputStreamReader(url.openStream()));
    StringBuffer buffer = new StringBuffer();
    int read;
    char[] chars = new char[1024];
    while ((read = reader.read(chars)) != -1)
        buffer.append(chars, 0, read); 

    return buffer.toString();
} finally {
    if (reader != null)
        reader.close();
}

And as I said, I am trying to learn so any pointers are welcome(I've been searching for a while until I gave up and wrote the rest of the code).

Thanks in advance.


Solution

  • Okay, so I finished this a while ago but forgot to answer my own question. I used HtmlUnit for this because it looked like it was the most simple to do.

    import com.gargoylesoftware.htmlunit.WebClient;  
    import com.gargoylesoftware.htmlunit.html.HtmlInput;  
    import com.gargoylesoftware.htmlunit.html.HtmlPage;  
    import com.gargoylesoftware.htmlunit.html.HtmlSubmitInput;
    

    In order to get data from that certain webpage, I needed to log into the website first. For this I needed to start a web client. The thing to remember about this is the need to use the same web client so you will need initiate WebClient in the method that will call the login method(This method will also later send the WebClient to get data and anything else you might need).

    WebClient webClient = new WebClient(); //Initiate a WebClient variable.  
    webClient = tremorLogin(webClient);
    

    Then in tremorLogin I will log into the website and return the client back to webClient variable.

    //Login into Tremor Games and return the client(Saves the cookies).
    private static WebClient tremorLogin(WebClient webClient) throws Exception
    {
        webClient.getOptions().setJavaScriptEnabled(false);
        HtmlPage currentPage = webClient.getPage("http://www.tremorgames.com/"); //Load page at the STRING address.
        HtmlInput username = currentPage.getElementByName("loginuser"); //Find element called loginuser for username
        username.setValueAttribute(user); //Set value for username
        HtmlInput password = currentPage.getElementByName("loginpassword"); //Find element called loginpassword for password
        password.setValueAttribute(pass); //Set value for password
        HtmlSubmitInput submitBtn = currentPage.getElementByName("Submit"); //Find element called Submit to submit form.
        currentPage = submitBtn.click(); //Click on the button.
    
        return webClient;
    }
    

    The loginuser text is what the text field of the user name is called when you check the source code of the website.

    HtmlInput username = currentPage.getElementByName("loginuser");
    

    The loginpassword text is what the text field of the password is called when you check the source code of the website.

    HtmlInput password = currentPage.getElementByName("loginpassword");
    

    user is your username(String type) and pass is your password(String type)

    username.setValueAttribute(user);  
    password.setValueAttribute(pass);
    

    After writing the username and password you will need to click on the submit button and for this you will need to find the name of the button in the website's source code(Same way as username and password text fields. After you have found the name of the button, you will need to click on it which is the second line.

     HtmlSubmitInput submitBtn = currentPage.getElementByName("Submit"); //Find element called Submit to submit form.
    currentPage = submitBtn.click(); //Click on the button.
    

    Once you return this, your web client is saved in the original method and later you can get all the data from there or anything else you might want to get from the website. In the original method you might have something like

    HtmlPage currentPage = webClient.getPage("http://www.tremorgames.com/index.php?action=shop&searchterm=steam&search_category=5&sort=price_asc&page=1");
    String pageSource = currentPage.asXml();
    

    After you have the website as xml in pageSource, you will have exactly the same text/code you see in developer tools and later you just need to search through it for the data you need.

    Hope this will help and save time for people.