Search code examples
javaparsingjsoup

How to change browser version in JSOUP?


I'm trying to parse data from amd.com. In Opera browser page looks like that, with cpu's name and link to the page in third column. But when I use JSOUP, it getting me this page (like in IE).
Get-method for a document:

private Document getDocument(String url) {
        try {
            String userAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36 OPR/66.0.3515.72";
            return Jsoup.connect(url).userAgent(userAgent).get();
        } catch (IOException e) {
            throw new IllegalStateException(e);
        }
    }

In userAgent I followed the TuyenNTA's advise.
I need to get links to cpus' pages.


Solution

  • The reason may be, that web page is changed dynamically by javascript. Jsoup will not be able to catch these changes. You could try to use jsoup in combination with selenium. Here is an example (you mentioned opera browser in your question, therefore example uses opera driver):

            // set opera driver location
            System.setProperty("webdriver.opera.driver", "<PATH_TO_operadriver.exe>");
            OperaOptions options = new OperaOptions();
            options.setBinary("<PATH_TO_opera.exe>");
            WebDriver driver = new OperaDriver(options);
    
            try {
                driver.get("http://amd.com");
                Document doc = Jsoup.parse(driver.getPageSource());
            } finally {
                driver.close();
                driver.quit();
            }