Search code examples
stringlua

Can I get scape characters still behave as such for a string provided by f:read() in Lua?


I'm working on a simple localization function for my scripts and, although it's starting to work quite well so far, I don't know how to avoid scape/special characters to be shown in UI as part of the text after feeding the widgets with the strings returned by f:read().

For example, if in a certain Strings.ES.txt's line I have: Ignorar \"Etiquetas de capa\", I'd expect backslashes didn't end showing up just like when I feed the widget with a normal string between doble quotes like: "Ignorar \"Etiquetas de capa\"", or at least have a way to avoid it. I've been trial-and-erroring with tostring() and load() functions and different (surely nonsense 🙄) concatenations like: load(tostring("[[" .. f:read()" .. ]]")) and such without any success, so here I'm again...

Do someone know if there is a way to get scape characters in a string returned by f:read() still behave as special as when they are found in a regular one?


Solution

  • I don't know how to avoid [e]scape/special characters to be shown in UI as part of the text

    What you want is to "unescape" or "unquote" a string to interpret escape sequences as if it were parsed as a quoted string by Lua.

    [...] with the strings returned by f:read() [...]

    The fact that this string was obtained using f:read() can be ignored; all that matters is that it is a string literal without quotes using quoted string escapes.

    I've been trial-and-erroring with tostring() and load() functions and different [...] concatenations like: load(tostring("[[" .. f:read()" .. ]]")) and such without any success [...]

    This is almost how to do it, except you chose the wrong string literal type: "Long" strings using pairs square brackets ([ and ]) do not interpret escape sequences at all; they are intended for including long, raw, possibly multiline strings in Lua programs and often come in handy when you need to represent literal strings with backslashes (e.g. regular expressions - not to be confused with Lua patterns, which use % for escapes, and lack the basic alternation operator of regular expressions).

    If you instead use single or double quotes to wrap the string, it will work fine:

    local function unescape_string(escaped)
        return assert(load(('return "%s"'):format(escaped)))()
    end
    

    this will produce a tiny Lua program (a "chunk") for each string, which just consists of return "<contents>". Recall that Lua chunks are just functions. Thus you can simply call the function to obtain the value of the string it returns. That way, Lua will interpret the escape sequences for us. The same approach is often used to use Lua for reading data serialized as Lua code.

    Note also the use of assert for error handling: load returns nil, err if there is a syntax error. To deal with this gracefully, we can wrap the call to load in assert: assert returns its first argument (the chunk returned by load) if it is truthy; otherwise, if it is falsy (e.g. nil in this case), assert errors, using its second argument as an error message. If you omit the assert and your input causes a syntax error, you will instead get a cryptic "attempt to call a nil value" error.

    You probably want to do additional validation, especially if these escaped strings are user-provided - otherwise a malicious string like str"; os.execute("...") can trivially invoke a remote code execution (RCE) vulnerability, allowing it to both execute Lua e.g. to block (while 1 do end), slow down or hijack your application, as well as shell commands using os.execute. To guard against this, searching for an unescaped closing quote should be sufficient (syntax errors e.g. through invalid escapes will still be possible, but RCE should not be possible excepting Lua interpreter bugs):

    local function unescape_string(escaped)
        -- match start & end of sequences of zero or more backslashes followed by a double quote
        for from, to in escaped:gmatch'()\\*()"' do
            -- number of preceding backslashes must be odd for the double quote to be escaped
            assert((to - from) % 2 ~= 0, "unescaped double quote")
        end
        return assert(load(('return "%s"'):format(escaped)))()
    end
    

    Alternatively, a more robust (but also more complex) and presumably more efficient way of unescaping this would be to manually implement escape sequences through string.gsub; that way you get full control, which is more suitable for user-provided input:

    -- Single-character backslash escapes of Lua 5.1 according to the reference manual: https://www.lua.org/manual/5.1/manual.html#2.1
    local escapes = {a = '\a', b = '\b', f = '\b', n = '\n', r = '\r', t = '\t', v = '\v', ['\\'] = '\\', ["'"] = "'", ['"'] = '"'}
    local function unescape_string(escaped)
        return escaped:gsub("\\(.)", escapes)
    end
    

    you may implement escapes here as you see fit; for example, this misses decimal escapes, which could easily be implemented as escaped:gsub("\\(%d%d?%d?)", string.char) (this uses coercion of strings to numbers in string.char and a replacement function as second argument to string.gsub).

    This function can finally be used straightforwardly as unescape_string(f:read()).