Search code examples
screen-scraping

Which programming language single page web scraping?


I want to build (hire someone to build) a program for windows. This program has to save some data of a single web page like name of the website, product name and product price on a command (under right-click or keyboard shortcuts) in a local database. Which programming language can I chose best? The amount of (affordable) programmers and the possibility to add some extra functionalities in the future is also important. I found for example that python, Java, Ruby and XPath are used for this job. Thank You.


Solution

  • Java, python and ruby are all good choices. Xpath is not a programming language, it's a query specification that allows you to extract the data you want from xml or html. No matter which language you choose you will need to also use xpath (all 3 have xpath libraries available).

    • Python seems to be the most popular but the future of it's libraries is also the most uncertain (nobody has bothered to port mechanize to python3 yet, beautiful soup has died and then come back).
    • Java's biggest strength may be that it's already installed on most windows machines, but it's also the only one of the three that is not a scripting language and therefore development time will likely be longer.
    • Ruby is a good choice with excellent scraping libs and plenty of programmers using it.