Here is a string similar to what I'm trying to match (with the exception of a couple of specific patterns, for the sake of simplicity).
Hello, tonight I'm in the town of Trenton in New Jersey and I will be staying in Hotel HomeStay [123] and I have no money.
I'm trying to match only the last in Hotel HomeStay [123]
.
I'm not very familiar with regex concepts like lookahead and lookbehind right now. Similar questions here don't seem to solve my issue. I've tried a bunch of regex (to the best of my understanding) and this is what I came up with (?= (?:in|\d+))([\w \[]*\s*\d*\]*)(?!.*in)
. The digits and special characters may be part of what I'm actually trying to match.
The lookahead and lookbehind patterns are not restricted to containing only in
. They can have more common words as well such as and
and is
. I'm only looking for the last occurence of any of these, followed by the main pattern, which is quite distinctive -- edit let's say the match should necessarily contain either HomeStay
or LuxuryInn
, for the sake of the example.
However, this matches the whole of in the town of Trenton in New Jersey and I will be staying in Hotel HomeStay [123]
.
Where am I going wrong? Also, could someone explain why the in
is captured despite being placed in a non-capturing group?
Any help is greatly appreciated.
If you want to retrieve a text containing HomeStay
prefixed by certain words and not containing those words, you can use a capture group using negative look-ahead inside. The regex below captures all occurrences (working fiddle).
\b(?:in|and|is)\s+((?:.(?!\b(?:in|and|is)\b))*HomeStay(?:.(?!\b(?:in|and|is)\b))*)
Here, the regexp looks for :
in
, and
or is
as a whole word, surrounded by word breakers \b
)HomeStay
,If you just want the last occurrence, you can add another negative look-ahead after (fiddle).
\b(?:in|and|is)\s+((?:.(?!\b(?:in|and|is)\b))*HomeStay(?:.(?!\b(?:in|and|is)\b))*)(?!.*HomeStay.*)
Same as above, except the matched text must not be followed by a text containing HomeStay
.
Finally, if the matching text has to contain at least a word from a list, just replace both occurrences of HomeStay
with a list of alternatives. Example for HomeStay and Luxury: (?:HomeStay|Luxury)
(fiddle).