Search code examples
javahtml-parsinghtmlunithtml-content-extraction

Java web scraper


What is the best library for a Java web scraper? I know the following choices:

  1. Selenium
  2. HTMLUnit
  3. Lobo browser

I need to select one option to build a scraper for one scalable project.


Solution

  • If you are scraping, why do you need a browser? Just doing basic cURL calls to a page and getting the response will give you what you need to do scraping.

    This will help with scalability. If you want a browser then go for HTMLUnit as that would again help with scalability.