I'm trying to create a string pattern that will match both non-space characters, and all characters inside a set of brackets. For example, a sequence such as this:
local str = [[
This [pattern] should [return both] non-space
characters and [everything inside] brackets
]]
Would print out This
, [pattern]
, should
, [return both]
, non-space
... etc. I've been going at this for a while, and came up with a very close solution that I know the problem to, but cannot seem to solve. Here's my attempt:
local str = [[
This [pattern] should [return both] non-space
characters and [everything inside] brackets
]]
for s in string.gmatch(str, "%S+%[?.-%]?") do
print(s)
end
The issue is that spaces should be allowed inside the brackets, but not outside. This would print something like: This
, [pattern]
, should
, [return
, both]
, non-space
... etc
Notice that [return
and both]
are two different captures, opposed to returning [return both]
. I'm still sort of new to string patterns, so I feel like there's a few options I could be overlooking. Anyway, if anyone is experienced with this sort of thing, I sure would appreciate some insight.
Just to explain Egor's solution in the comment a bit, the key idea is to differentiate between whitespaces that are inside the brackets []
from the ones that are outside. This is accomplished by
gsub
ing the whitespaces outside the brackets replacing it with \0
.gmatch
over the string matching against non-null characters.The null char \0
is used as a sentinel since it's unlikely to clash with a legitimate character in the input text.
A variation to this approach is to replace the whitespace inside the bracket instead followed with matches against non-whitespace characters
for s in str:gsub("(%[.-%])",
function(x)
return x:gsub("%s+", "\0")
end)
:gmatch "%S+"
do
print( (s:gsub("%z+", " ")) )
end
Note that you are creating intermediate strings during the parse. If the input string is long then so is the temporary intermediate string. For one-off matches this is probably okay. If you're dealing with more heavy-duty parsing I suggest checking out LPEG.
For example, the following lpeg.re grammar can parse the given input text
local re = require 're'
local str =
[[
This [pattern] should [return both] non-space
characters and [everything inside brackets]
]]
local pat = re.compile
[[
match_all <- %s* match_piece+ !.
match_piece <- {word / bracket_word} %s*
word <- ([^]%s[])+
bracket_word <- '[' (word %s*)+ ']'
]]
for _, each in ipairs{ pat:match(str) } do
print(each)
end
Outputs:
This
[pattern]
should
[return both]
non-space
characters
and
[everything inside brackets]