str = "cat,dog,,horse"
for word in string.gmatch(str, "([^,'',%s]+)") do
print(word)
end
This code outputs the following.
cat
dog
horse
I want to consider nil entry as well and want to have the following output.
cat
dog
nil
horse
How can this be done? Could someone please point out?
A few things:
nil ~= ""
. You probably want the empty string rather than nil here. It is however trivial to convert one into the other, so I'll be using the empty string in the following code.gmatch
pattern. If there are no "captures" (parentheses), the entire pattern is implicitly captured.'
and ,
twice in the character class; just once suffices. I'll be assuming you want to split by ,
.The issue is that currently your pattern uses the +
(one or more) quantifier when you want *
(zero or more). Just using *
works completely fine on Lua 5.4:
Lua 5.4.4 Copyright (C) 1994-2022 Lua.org, PUC-Rio
> local str = "cat,dog,,horse"; for word in str:gmatch"[^,]*" do print(word) end
cat
dog
horse
However, there is an issue when you try to run that same code on LuaJIT: It will produce seemingly random empty strings rather than only producing an empty string for two consecutive delimiters (this could be seen as "technically correct" since the empty string is a match for *
, but I see it as a violation of the greediness of *
). One solution is to require each match to end with a delimiter, appending a delimiter, and matching everything but the delimiter:
LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/
JIT: ON SSE2 SSE3 SSE4.1 AMD BMI2 fold cse dce fwd dse narrow loop abc sink fuse
> local str = "cat,dog,,horse"; for word in (str .. ","):gmatch("(.-),") do print(word) end
cat
dog
horse
A third option would be to split manually using repeated calls to string.find
. Here's the utility I wrote myself for that:
function spliterator(str, delim, plain)
assert(delim ~= "")
local last_delim_end = 0
-- Iterator of possibly empty substrings between two matches of the delimiter
-- To exclude empty strings, filter the iterator or use `:gmatch"[...]+"` instead
return function()
if not last_delim_end then
return
end
local delim_start, delim_end = str:find(delim, last_delim_end + 1, plain)
local substr
if delim_start then
substr = str:sub(last_delim_end + 1, delim_start - 1)
else
substr = str:sub(last_delim_end + 1)
end
last_delim_end = delim_end
return substr
end
end
The usage in this example would be
for word in spliterator("cat,dog,,horse", ",") do print(word) end
Whether you want to add this to the string
table, keep it in a local variable or perhaps a require
d string util module is up to you.