Search code examples
stringlualua-patterns

Equivalent pattern to "[\0-\x7F\xC2-\xF4][\x80-\xBF]*" in Lua 5.1


When answering this question, I wrote this code to iterate over the UTF-8 byte sequence in a string:

local str = "KORYTNAČKA"
for c in str:gmatch("[\0-\x7F\xC2-\xF4][\x80-\xBF]*") do 
    print(c) 
end

It works in Lua 5.2, but in Lua 5.1, it reports an error:

malformed pattern (missing ']')

I recall in Lua 5.1, the string literal \xhh is not supported, so I modified it to:

local str = "KORYTNAČKA"
for c in str:gmatch("[\0-\127\194-\244][\128-\191]*") do 
    print(c) 
end

But the error stays the same, how to fix it?


Solution

  • See the Lua 5.1 manual on patterns.

    A pattern cannot contain embedded zeros. Use %z instead.
    

    In Lua 5.2, this was changed so that you could use \0 instead, but not so for 5.1. Simply add %z to the first set and change the first range to \1-\127.