Search code examples
javacookiesjsouphtmlunit

How to handle JavaScript redirects in jsoup


I have this url http://www.zara.com/qr/1260020210042 and I am trying to get the redirected final URL:

    String url = "http://www.zara.com/qr/1260020210042";
    Response response = Jsoup.connect(url).followRedirects(true).execute();     
    String url2 = response.url().toString();
    Response response2 = Jsoup.connect(url2).followRedirects(true).execute();
    System.out.println(response2.url());

but it doesn't print the final redirected URl , what shall I change? Thanks,

EDIT:

I tried also with Htmlunit but it doesn't give me the final link which I need:

        WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);
        webClient.getOptions().setJavaScriptEnabled(true);
        webClient.getOptions().setRedirectEnabled(true);
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        webClient.getOptions().setCssEnabled(true);     
        HtmlPage page = (HtmlPage) webClient.getPage("http://www.zara.com/qr/1260020210042");
        WebResponse response = page.getWebResponse();
        String content = response.getContentAsString();
        System.out.println(page.getUrl());

Solution

  • The HtmlUnit solution suggested by Frederic Klein actually works nicely, but there is a cookie-related caveat, see "update" comment below.

    First add this dependency to your Maven configuration:

    <dependency>
      <groupId>net.sourceforge.htmlunit</groupId>
      <artifactId>htmlunit</artifactId>
      <version>2.25</version>
    </dependency>
    

    Then use it like this:

    package de.scrum_master.stackoverflow;
    
    import com.gargoylesoftware.htmlunit.WebClient;
    import com.gargoylesoftware.htmlunit.WebClientOptions;
    import org.jsoup.Connection.Response;
    import org.jsoup.Jsoup;
    
    import java.io.IOException;
    import java.net.MalformedURLException;
    import java.net.URL;
    
    import static com.gargoylesoftware.htmlunit.BrowserVersion.CHROME;
    import static java.util.logging.Level.OFF;
    import static java.util.logging.Logger.getLogger;
    
    public class Application {
      public static void main(String[] args) throws IOException {
        WebClient webClient = createWebClient();
        String originalURL = "http://www.zara.com/qr/1260020210042";
        String redirectedURL = webClient.getPage(originalURL).getUrl().toString();
        Response response = Jsoup.connect(redirectedURL).execute();
        System.out.println(response.url());
      }
    
      private static WebClient createWebClient() throws MalformedURLException {
        getLogger("com.gargoylesoftware").setLevel(OFF);
        WebClient webClient = new WebClient(CHROME);
        WebClientOptions options = webClient.getOptions();
        options.setJavaScriptEnabled(true);
        options.setRedirectEnabled(true);
        // IMPORTANT: Without the country/language selection cookie the redirection does not work!
        webClient.addCookie("storepath=us/en", new URL("http://www.zara.com/"), null);
        return webClient;
      }
    }
    

    The console log says:

    http://www.zara.com/us/en/man/shoes/leather/brown-braided-leather-ankle-boots-c0p4065286.html
    

    Update: Okay, I found the root cause of your problem. It is not HtmlUnit but the very fact that redirection on zara.com just does not work before the user has manually selected country + language during his first visit with any browser. The info is stored in a cookie named storefront without which every browser session will always land at the front page with the country selection dialogue again. I have updated my sample code so as to set that cookie to USA + English. Then it works.

    Enjoy!