Search code examples
jsoupwebscarab

Facing issue to extract youtube page source using Jsoup


Using Jsoup, I am able to extract the most websites page source code (right click on webpage and choose "View Page Source"). But for any youtube video page, I am unable to extract page source Its not giving proper page source code. Tried the following coed but failed to extract.

public class App {
  public static void main(String[] args) throws IOException {

    String webUrl = "https://www.youtube.com/watch?v=Zu6o23Pu0Do";
    Document doc = Jsoup.connect(webUrl)
            .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36")
            .get();

    System.out.println(doc);

 }
}

Anybody can have any advice to fix this???

I am getting the output like the following:

sample output


Solution

  • You're not setting a user agent which could be triggering anti scraping measures by the website. I'm going to assume the problem is your connection is timing out when you're running this. Try to use the following user agent and see if it works for you off of the connect().

    .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36")