I'm working on a simple localization function for my scripts and, although it's starting to work quite well so far, I don't know how to avoid scape/special characters to be shown in UI as part of the text after feeding the widgets with the strings returned by f:read()
.
For example, if in a certain Strings.ES.txt's line I have: Ignorar \"Etiquetas de capa\"
, I'd expect backslashes didn't end showing up just like when I feed the widget with a normal string between doble quotes like: "Ignorar \"Etiquetas de capa\""
, or at least have a way to avoid it. I've been trial-and-erroring with tostring()
and load()
functions and different (surely nonsense 🙄) concatenations like: load(tostring("[[" .. f:read()" .. ]]"))
and such without any success, so here I'm again...
Do someone know if there is a way to get scape characters in a string returned by f:read()
still behave as special as when they are found in a regular one?
I don't know how to avoid [e]scape/special characters to be shown in UI as part of the text
What you want is to "unescape" or "unquote" a string to interpret escape sequences as if it were parsed as a quoted string by Lua.
[...] with the strings returned by
f:read()
[...]
The fact that this string was obtained using f:read()
can be ignored; all that matters is that it is a string literal without quotes using quoted string escapes.
I've been trial-and-erroring with
tostring()
andload()
functions and different [...] concatenations like:load(tostring("[[" .. f:read()" .. ]]"))
and such without any success [...]
This is almost how to do it, except you chose the wrong string literal type: "Long" strings using pairs square brackets ([
and ]
) do not interpret escape sequences at all; they are intended for including long, raw, possibly multiline strings in Lua programs and often come in handy when you need to represent literal strings with backslashes (e.g. regular expressions - not to be confused with Lua patterns, which use %
for escapes, and lack the basic alternation operator of regular expressions).
If you instead use single or double quotes to wrap the string, it will work fine:
local function unescape_string(escaped)
return assert(load(('return "%s"'):format(escaped)))()
end
this will produce a tiny Lua program (a "chunk") for each string, which just consists of return "<contents>"
. Recall that Lua chunks are just functions. Thus you can simply call the function to obtain the value of the string it returns. That way, Lua will interpret the escape sequences for us. The same approach is often used to use Lua for reading data serialized as Lua code.
Note also the use of assert
for error handling: load
returns nil, err
if there is a syntax error. To deal with this gracefully, we can wrap the call to load
in assert
: assert
returns its first argument (the chunk returned by load
) if it is truthy; otherwise, if it is falsy (e.g. nil
in this case), assert
errors, using its second argument as an error message. If you omit the assert
and your input causes a syntax error, you will instead get a cryptic "attempt to call a nil value" error.
You probably want to do additional validation, especially if these escaped strings are user-provided - otherwise a malicious string like str"; os.execute("...")
can trivially invoke a remote code execution (RCE) vulnerability, allowing it to both execute Lua e.g. to block (while 1 do end
), slow down or hijack your application, as well as shell commands using os.execute
. To guard against this, searching for an unescaped closing quote should be sufficient (syntax errors e.g. through invalid escapes will still be possible, but RCE should not be possible excepting Lua interpreter bugs):
local function unescape_string(escaped)
-- match start & end of sequences of zero or more backslashes followed by a double quote
for from, to in escaped:gmatch'()\\*()"' do
-- number of preceding backslashes must be odd for the double quote to be escaped
assert((to - from) % 2 ~= 0, "unescaped double quote")
end
return assert(load(('return "%s"'):format(escaped)))()
end
Alternatively, a more robust (but also more complex) and presumably more efficient way of unescaping this would be to manually implement escape sequences through string.gsub
; that way you get full control, which is more suitable for user-provided input:
-- Single-character backslash escapes of Lua 5.1 according to the reference manual: https://www.lua.org/manual/5.1/manual.html#2.1
local escapes = {a = '\a', b = '\b', f = '\b', n = '\n', r = '\r', t = '\t', v = '\v', ['\\'] = '\\', ["'"] = "'", ['"'] = '"'}
local function unescape_string(escaped)
return escaped:gsub("\\(.)", escapes)
end
you may implement escapes here as you see fit; for example, this misses decimal escapes, which could easily be implemented as escaped:gsub("\\(%d%d?%d?)", string.char)
(this uses coercion of strings to numbers in string.char
and a replacement function as second argument to string.gsub
).
This function can finally be used straightforwardly as unescape_string(f:read())
.