Retrieve data from a website-hadoop

I am able to retrieve data\tweets from twitter using developer API in past.

Now I want to retrieve data from a website. Its not click-stream data but the actual data being updated in website. For example, I want to retrieve match details that are being updated daily in a cricket website like cricinfo etc.

Could someone help me how to do this.

Thanks, Sree

Solution

Have a look at this. Probably you can also try using RSS Feeds for this purpose provided by espncricinfo.com.