Search code examples
regexlualua-5.1lpeg

Match repeatable string as a "whole word" in Lua 5.1


My environment:

  • Lua 5.1
  • Absolutely no libraries with a native component (like a C .so/.dll) can be used
  • I can run any arbitrary pure Lua 5.1 code, but I can't access os and several other packages that would allow access to the native filesystem, shell commands or anything like that, so all functionality must be implemented in Lua itself (only).
  • I've already managed to pull in LuLpeg. I can probably pull in other pure Lua libraries.

I need to write a function that returns true if the input string matches an arbitrary sequence of letters and numbers as a whole word that repeats one or more times, and may have punctuation at the beginning or end of the entire matching substring. I use "whole word" in the same sense as the PCRE word boundary \b.

To demonstrate the idea, here's an incorrect attempt using the re module of LuLpeg; it seems to work with negative lookaheads but not negative lookbehinds:

function containsRepeatingWholeWord(input, word)
    return re.match(input:gsub('[%a%p]+', ' %1 '), '%s*[^%s]^0{"' .. word .. '"}+[^%s]^0%s*') ~= nil
end

Here are example strings and the expected return value (the quotes are syntactical as if typed into the Lua interpreter, not literal parts of the string; this is done to make trailing/leading spaces obvious):

  • input: " one !tvtvtv! two", word: tv, return value: true
  • input: "I'd", word: d, return value: false
  • input: "tv", word: tv, return value: true
  • input: " tvtv! ", word: tv, return value: true
  • input: " epon ", word: nope, return value: false
  • input: " eponnope ", word: nope, return value: false
  • input: "atv", word: tv, return value: false

If I had a full PCRE regex library I could do this quickly, but I don't because I can't link to C, and I haven't found any pure Lua implementations of PCRE or similar.

I'm not certain if LPEG is flexible enough (using LPEG directly or through its re module) to do what I want, but I'm pretty sure the built-in Lua functions can't do what I want, because it can't handle repeating sequences of characters. (tv)+ does not work with Lua's builtin string:match function and similar.

Interesting resources I've been scouring to try to figure out how to do this, to no avail:


Solution

  • Lua patterns are powerful enough.
    No LPEG is needed here.

    This is your function

    function f(input, word)
       return (" "..input:gsub(word:gsub("%%", "%%%%"), "\0").." "):find"%s%p*%z+%p*%s" ~= nil
    end
    

    This is a test of the function

    for _, t in ipairs{
       {input = " one !tvtvtv! two", word = "tv", return_value = true},
       {input = "I'd", word = "d", return_value = false},
       {input = "tv", word = "tv", return_value = true},
       {input = "   tvtv!  ", word = "tv", return_value = true},
       {input = " epon ", word = "nope", return_value = false},
       {input = " eponnope ", word = "nope", return_value = false},
       {input = "atv", word = "tv", return_value = false},
    } do
       assert(f(t.input, t.word) == t.return_value)
    end