Search code examples
xpathscreen-scraping

What is the XPATH of these table rows on this page? I can't figure it out!


I've never had such a mission scraping a web page as I do with this specific one. I am trying to parse the reviews from Omgili's API results page. An example page is located here:

Omgili

I have scraped lots of pages before, but the exact XPATH of the results of this page is really tricky, since there are no DIV class names, and there are like 5 nested tables. I would like XPath that returns all of the table rows for each result (e.g. the first result would be the TR that contains the first review: "Does exactly what it needs to do - [03 Feb 2010] " and its content.

Any help for this, or at least point me to a resource that can help? I have tried CHrome selector gadget, but not even this works for this site.

I have tried the following currently, but this fails: //table//table//tr[4]//table/tr/td[1]/table/tr


Solution

  • I'd be tempted to cheat (if it works!) and note that the review links are the only links on that page with targets that start jmp. So

    //tr[td/span/a[starts-with(@href, 'jmp')]]
    

    should be the rows you want.