I'm trying to build a regular expression where I been asked to filter string like:
country:India provinces:Uttranchal city:Dehradun zip_code:12345
from the string like this:
keyword: one two three country:India provinces:Uttranchal city:Dehradun zip_code:12345 filter: myparameter
Now I have prepared a basic regex like:
country:\w+|provinces:\w+|city:\w+|zip_code:\w+
Which sort of does the work for me If country
,provinces
,city
are single words
But if they are not example
keyword: one two three country:United-States provinces:Manhattan city:New-York zip_code:12345 filter: myparameter
The above reqex just not work because of the limitation of non word character like -
You can assume that the country,province or city and have word that join by many -
like
country:United-States-of-America provinces:Washington-Dc city:New-York-West
etc etc ...
so -\w+
is kind of recursive pattern with 0 or more occurence in either country
,provinces
,city
or all of them
Now I also tried build a regex for the same something like this
(country:\w+(-\w+)*)|(province:\w+(-\w+)*)|(city:\w+(-\w+)*)|(zip_code:\w+(-\w+)*)
This although matches but as you can see in rubular screenshot attach that it also presented non accepted output and nil
all I want is to avoid the non-accepted
and nil
output which causes problem in match result when segregating the desired string from the given input string or can somebody suggest me better regex than this.
If you want to match a group of one or more letters and dashes, you can just use brackets to indicate a character set: [\w\-]+
country:[\w\-]+|provinces:[\w\-]+|city:[\w\-]+|zip_code:[\w\-]+
Two-line example in Python:
>>> s = "keyword: one two three country:United-States provinces:Manhattan city:New-York zip_code:12345 filter: myparameter"
>>> print re.findall("country:[\w\-]+|provinces:[\w\-]+|city:[\w\-]+|zip_code:[\w\-]+", s)
['country:United-States', 'provinces:Manhattan', 'city:New-York', 'zip_code:12345']