Search code examples
regexlua

Fully-featured regex library in pure Lua


I am writing an Elder Scrolls Online addon, which are supported by a lightly modified Lua 5.1 engine called Havok Script. This Lua environment does not allow access to the os, io, package, debug modules or any native platform bindings, and there is no way to get around this limitation because ESO is proprietary software.

In this restricted environment I need a feature-complete regex engine with lookaround functionality (negative and positive lookahead and lookbehind). Performance is nearly irrelevant, but convenience is a top concern (I don't have the time or ability to write my own regex engine).

The actual syntax of the regex engine is less important than the feature set. So PCRE, JS regex, Java regex or .NET regex engine, any of the above or even something a little different, would probably be fine. POSIX is too simple because it doesn't support any lookaround behavior.

The regexes will be unverified user input, but the environment is effectively a sandbox so the user can't do anything malicious with them. Since the input is user input, I can't "just" use something like LPEG; the user base would absolutely object to having to learn an entirely new concept like LPEG instead of the relatively familiar regex syntax.

When looking for Lua regex engines, I've exhausted a number of options:

  • Bindings to the native platform, like lregexp and other libpcre Lua bindings. These categorically will not and cannot ever work for my use case, because the environment cannot access the native platform, so they're out.
  • reLua, which supports the basic "regular" patterns like alternation and greedy closures, but absolutely no lookaround behavior. I don't have the ability to add lookaround to this project, so unless there exists a fork with that added, I can't use this.
  • Transpiling a full regex engine implemented in pure JavaScript (without using the builtin regex functions of JS) into Lua using castl. This was somewhat promising, but I hit a fatal flaw in castl, and apparently also tessel, in that Lua has an artificially low variable limit per scope of 200, and these transpilers don't have any way of working around that (what they would have to do is declare one local variable as a table and stuff all the data into that table, then change all references in the original JS from local variable accesses into table accesses in Lua). Because this is such a fundamental problem, I'm not sure this can work, but maybe the closest thing to a solution is to try and plow through this problem by solving this somehow?
  • I've looked for transpilers from other languages than JS, but I found none. Basically the only "X language into pure Lua" compilers I could find were castl and tessel.
  • I also (in desperation) tried transpiling an up to date, emscriptened libpcre (for the uninitiated: C code compiled into JS) and then transpiling that into Lua using castl. This results in an even stranger Lua error when running the code, where the Lua interpreter can't find a label it's been asked to goto, despite that label clearly existing in the code. I can only figure this is because the generated code is so huge that Lua just gives up trying to look for it.

At the moment I've reached an impasse; I don't know how to proceed to get the functionality I want. Is there a library that does provide a pure Lua, fully featured regex engine for Lua that I just haven't found yet? I gave up around the seventh or eighth page of Google.


Solution

  • Depending on what your exact requirements are, you could try out the re module of LPEG. The clear advantage is that this is available basically everywhere where Lua is available. On the other hand you have to keep in mind that this is an independent implementation of regex and is therefore not compatible with, e.g. POSIX. However, as long as the expression are simple enough you should not notice.