Search code examples
javascriptpythonjquerywebapi

Web API to extract information from web site


I need kind of service that extracts title from web page and returns in from of JSON. I would not like to parse web page or waste any unnecessary CPU cycles. i.e. call should be something like this:

curl http://api.someservice.com/fetch?url=google.com&element=title&out=json

Response from API would be:

{
    response: {
        title: "Google"
        source: "google.com"
    }
    status: "success"

}

Any hint would be highly appreciated.


Solution

  • You should have a look at YQL - it's a general-purpose service from Yahoo! that can do this kind of scraping really easily. Try this:

    select * from html where url="google.com" and xpath='//title'
    

    Test it here.