I have a website I want to crawl. It contains multiple items that I wish to extract on each page.
Its very similar to an online yellow pages. It contains a title, phone number and category for each item... As this obviously isn't enough info to take an entire page to itself, the items are in a list. Some pages containing 3 items and others containing 10 or so.
--Edit 1-- I've successfully scraped many websites but they've all been possible to get to a page where only one item is contained. This is not possible here and due to the need for different templates, it is returning multiple items as one item or just random bits a pieces.
Portia doesn't yet support extracting multiple items per page. There is an issue for it, and there is sufficient interest that it will be done soon.
In the meantime, one trick is to nest the items within a parent item (using 'variants') and split them later into separate items in a post-processing step.