I am trying to use Rails to extract data from Wikipedia, based on a search term.
For example,
1) if I have the String "American Idol", I want to pass that to Wikipedia and get a list of the articles that relate to that. My goal will be to take the first 3 hyperlinks and display them on the website.
2) one step further would involve me extracting small pieces of data from Wikipedia - say the infobox, or the first few words of the wikipedia article.
Any tips?
Thanks!
You don't need to resort to screen-scraping, MediaWiki has a very comprehensive API for precisely this kind of thing. See https://github.com/jpatokal/mediawiki-gateway for a handy Ruby wrapper around it.
Alternatively, if you're only interested in data like infoboxes, see DBpedia for the database version of Wikipedia.