Search code examples
regexextractlookbehind

Extract a portion of text using RegEx


I would like to extract portion of a text using a regular expression. So for example, I have an address and want to return just the number and streets and exclude the rest:

2222 Main at King Edward Vancouver BC CA

But the addresses varies in format most of the time. I tried using Lookbehind Regex and came out with this expression:

.*?(?=\w* \w* \w{2}$)

The above expressions handles the above example nicely but then it gets way too messy as soon as commas come into the text, postal codes which can be a 6 character string or two 3 character strings with a space in the middle, etc...

Is there any more elegant way of extracting a portion of text other than a lookbehind regex?

Any suggestion or a point in another direction is greatly appreciated.

Thanks!


Solution

  • Regular expressions are for data that is REGULAR, that follows a pattern. So if your data is completely random, no, there's no elegant way to do this with regex.

    On the other hand, if you know what values you want, you can probably write a few simple regexes, and then just test them all on each string.

    Ex. regex1= address # grabber, regex2 = street type grabber, regex3 = name grabber.

    Attempt a match on string1 with regex1, regex2, and finally regex3. Move on to the next string.