Search code examples
python-2.7web-crawlerscrapyportia

How to use regex in Portia visual scrapy?


I can able to annotate the web pages using Portia web crawler, my question is how can use the Regex while extracting the data.

For Example,

I have extracted Location filed from a page

Output looks like,

Location : Location xyz,abc

enter image description here

But I need only the xyz,abc values.

I have googled for solutions, but not getting more information.

Could you explain about regex in Portia scrapy?


Solution

  • You need to use capture groups to extract the data so in this case:

    Location: (.*)
    

    This tells portia to extract all data following the Location: string.

    If for example you only wanted to extract all of the data between Location: and the , you could use the following:

    Location: (.*),
    

    You can also place information inside the capture group so all data up to and including your pattern is extracted.