Search code examples
apisearchsearch-enginedata-mining

API to search news by year


I would like to write Python script, which will get i.e. 100 news/texts from year 2011, 2010, 2009, etc. on given topic.

I need searching API which will met following requirements:

  1. Available for free, exposed as a web service.
  2. Returns given amount of objects.
  3. Filters by date. Precisely, allows to get objects from given years.
  4. Return should contain quite long text (i.e. more than 100 words) which is related to given keyword.
  5. This text is easy to extract from the whole response.

For example, I tried with Google Web Search API:

8 first results from year 2007:
https://ajax.googleapis.com/ajax/services/search/web?q=Obama+daterange%3A2454102-2454467&start=0&rsz=8&v=1.0

Points 1 and 2 are fulfilled. Filtering by years is added with not very popular datarange: search operator. Point 5 is ok, because response is JSON. Problem is with point 4, because it returns only short content and title. I have an URL of a page with the full content, but then (after another GET request) it's hard to extract this content from the whole HTML document.

Do you know such API? Or maybe you have another idea how to tackle this problem?


Solution

  • The Guardian (A uk newspaper) are quite good when it comes to making their data available. They even have google doc integration. Check http://www.guardian.co.uk/open-platform out

    Do you need the data to be live, or would a simple dataset meet your needs?