I would like to write Python script, which will get i.e. 100 news/texts from year 2011, 2010, 2009, etc. on given topic.
I need searching API which will met following requirements:
For example, I tried with Google Web Search API:
8 first results from year 2007:
https://ajax.googleapis.com/ajax/services/search/web?q=Obama+daterange%3A2454102-2454467&start=0&rsz=8&v=1.0
Points 1 and 2 are fulfilled. Filtering by years is added with not very popular datarange: search operator. Point 5 is ok, because response is JSON. Problem is with point 4, because it returns only short content and title. I have an URL of a page with the full content, but then (after another GET request) it's hard to extract this content from the whole HTML document.
Do you know such API? Or maybe you have another idea how to tackle this problem?
The Guardian (A uk newspaper) are quite good when it comes to making their data available. They even have google doc integration. Check http://www.guardian.co.uk/open-platform out
Do you need the data to be live, or would a simple dataset meet your needs?