Search code examples
regexregular-language

Recursive sort of regular expression


I'm trying to build a regular expression where I been asked to filter string like:

country:India provinces:Uttranchal city:Dehradun zip_code:12345

from the string like this:

keyword: one two three country:India provinces:Uttranchal city:Dehradun zip_code:12345 filter: myparameter

Now I have prepared a basic regex like:

country:\w+|provinces:\w+|city:\w+|zip_code:\w+

Which sort of does the work for me If country,provinces,city are single words

But if they are not example

keyword: one two three country:United-States provinces:Manhattan city:New-York zip_code:12345 filter: myparameter

The above reqex just not work because of the limitation of non word character like -

You can assume that the country,province or city and have word that join by many -

like

country:United-States-of-America provinces:Washington-Dc city:New-York-West

etc etc ...

so -\w+ is kind of recursive pattern with 0 or more occurence in either country,provinces,city or all of them

Now I also tried build a regex for the same something like this

(country:\w+(-\w+)*)|(province:\w+(-\w+)*)|(city:\w+(-\w+)*)|(zip_code:\w+(-\w+)*)

This although matches but as you can see in rubular screenshot attach that it also presented non accepted output and nil

all I want is to avoid the non-accepted and nil output which causes problem in match result when segregating the desired string from the given input string or can somebody suggest me better regex than this.


Solution

  • If you want to match a group of one or more letters and dashes, you can just use brackets to indicate a character set: [\w\-]+

    country:[\w\-]+|provinces:[\w\-]+|city:[\w\-]+|zip_code:[\w\-]+
    

    Two-line example in Python:

    >>> s = "keyword: one two three country:United-States provinces:Manhattan city:New-York zip_code:12345 filter: myparameter"
    >>> print re.findall("country:[\w\-]+|provinces:[\w\-]+|city:[\w\-]+|zip_code:[\w\-]+", s)
    ['country:United-States', 'provinces:Manhattan', 'city:New-York', 'zip_code:12345']