I've never had such a mission scraping a web page as I do with this specific one. I am trying to parse the reviews from Omgili's API results page. An example page is located here:
I have scraped lots of pages before, but the exact XPATH of the results of this page is really tricky, since there are no DIV class names, and there are like 5 nested tables. I would like XPath that returns all of the table rows for each result (e.g. the first result would be the TR that contains the first review: "Does exactly what it needs to do - [03 Feb 2010] " and its content.
Any help for this, or at least point me to a resource that can help? I have tried CHrome selector gadget, but not even this works for this site.
I have tried the following currently, but this fails: //table//table//tr[4]//table/tr/td[1]/table/tr
I'd be tempted to cheat (if it works!) and note that the review links are the only links on that page with targets that start jmp
. So
//tr[td/span/a[starts-with(@href, 'jmp')]]
should be the rows you want.