I have a string below which is stored in a file aaa
333333444444aaa[aaa[[bb[b[ccc]zzz]xx[x]cc]]cc222222211111111
The left and right square brackets may not match in the string. So I want to grep all the lowercase letters and square brackets as a string. I'm using grep -o '[a-z\[\]]*' aaa
to get below as a whole.
aaa[aaa[[bb[b[ccc]zzz]xx[x]cc]]cc
But it returns 3 patterns which are single lowercase letter, single left square bracket, single lowercase letter with one or more right square bracket on the right.
So I tried grep -o '[a-z\[]*' aaa
. It returns 2 patterns which are lowercase letters with left square brackets, lowercase letters. That's closer to the result I want but still not correct for sure.
Is it possible to only use grep -o
and square brackets matching to get the expected result?
Since you did not tell grep to do otherwise, it is using POSIX Basic Regular Expression syntax. Your regex includes a bracket expression followed by a lone right bracket followed by an asterix:
grep -o '[a-z\[\]]*'
→
[a-z\[\]
]
*
So, your expression tells grep to look for:
a
to z
; or backslash; or left bracket); thenBackslashes are not special inside a BRE bracket expression. Nor are left brackets. As the reference above states, and @tshiono notes in the comments, to include a right bracket inside the bracket expression it must appear first.
This leads to the slightly odd looking regex [][a-z]
or, equivalently, the even odder looking []a-z[]
.
Had you used grep's -E
option you would have seen the same result since "The rules for ERE Bracket Expressions are the same as for Basic Regular Expressions".
However, if your grep supports -P
(Perl syntax), your original regex would give the result you intended.