Search code examples
javascripthtmljaunt-api

Cannot get form from webpage


I am trying to get the login form from:

https://www.etoro.com/login

when I inspect in Chrome I can see the element, however when I use the jaunt api in Java I cannot get the form.

userAgent = new UserAgent();
userAgent.visit("https://etoro.com/login");
List<Form> forms = userAgent.doc.getForms();
System.out.println(forms.size()); // 0

I have little experience in HTML so any direction would be great!

This is my first post so if I havent done something correctly please let me know.

Thank you very much!


Solution

  • Well, you are out of luck with a simple Java web scraper.

    If you look at the source of the page in the browser, you see, that the page consists mainly of a long <script>. The whole login form is then created by the browser with Javascript.

    If you absolutely must scrape this exact form, you need a tool, that can execute Javascript. For this, you could use PhantomJS. That's basically a complete browser, that can be controlled with a Javascript API.

    Search Google for phantomjs web scraping to get you started.