I am trying to write a regex for a set of words. I want to select only the words that have one or more repetition of a substring.
For instance, among the following words:
banana baba nano nana nanna
I only want to select words banana, baba, nana and do NOT want to select nano and nanna. What I am trying to find here are words that contains ba or na one or more times and not the words that contains anything other than one or more instances of ba or na. Therefore, nanna shouldn't be selected because it contains an extra n between the two na's.
I tried quite a few regex but couldn't find the exact results. So far this is the regex where I am stuck at.
\w+(ba|na)
This selects nanna as well which I don't want. I am new to regex and have tried quite a few examples and tutorials, and also looked around for a while.
P.S I am using this website to test my regex.
\b(?:[bn]a)+\b
Demo: https://regex101.com/r/iFRfBC/1
Explanation:
\b
- Matches a "word border", preventing additional letters from preceding (or following, at the end of the pattern) the match.
(?: ... )+
- A (non-capturing) group, quantified one or more times.
[bn]a
- A literal b
or n
, followed by an a
.