Search code examples
luastring-matchinglua-patternswikitext

Lua: Globally match for stuff of the form {{{parameter}}} or {{{parameter|default}}}, capturing "pattern" and "default", without overflow


I'm trying to use Lua pattern matching in a Wikipedia module to locate instances of Mediawiki parameter syntax (e.g. {{{parameter1-a|defaultValue}}} or {{{parameter1-a|{{{alias1-a|defaultValue}}}}}}) so they can be converted into Lua-compatible argument syntax. (Yes, I am fully aware that using pattern matching for this is an unforgivable crime against humanity, but whatever.)

So far, I have this relatively simple pattern, which works fine for the most part:

"{{{([^{}<>|]+)(|?([^{}|]*))}}}"

(Regex equivalent [hopefully], if you want to test on regex101: /{{{([^{}<>|]+)(?:\|([^{}|]+)?)?}}}/g)

However, this can't properly match anything in the "default" part that itself contains curly braces, so I can't include aliases for the parameter or template wikitext in the default. More specifically:

  • The unmodified regex will, if given something like {{{parameter|{{{alias|default}}}}}}, just match/capture {{{parameter|{{{alias|default}}}}}}.
  • Removing the curly brace restriction ("{{{([^{}<>|]+)(|?([^{}|]*))}}}") will succeed with {{{parameter|{{{alias}}}}}}, yielding {{{parameter|{{{alias}}}}}} as intended, but with a default on the alias it'll give {{{parameter|{{{alias|default}}}}}}
  • Just matching everything ("{{{([^{}<>|]+)(|?(.*))}}}") works perfectly with one parameter, but with two it'll "spill" if the first has a default: {{{parameter1|default}}} {{{parameter2}}} will yield {{{parameter1|default}}} {{{parameter2}}}

How do I solve this?


Solution

  • This seems like a perfect use case for Lua's special bracket matching pattern item %b! Using %b{}, you can match a pair of matching curly braces. By surrounding this with two curly braces on each side, you can match three pairs of curly braces.

    Given your test cases:

    local text = [[
    lorem ipsum dolor sit amet
    {{{blarg}}}
    lorem ipsum dolor sit amet
    {{{blarg|default}}}
    lorem ipsum dolor sit amet
    {{{parameter1-a|{{{alias1-a|defaultValue}}}}}}
    ]]
    

    and using the pattern {{%b{}}} in gmatch:

    for match in text:gmatch"{{%b{}}}" do
        print(match)
    end
    

    you get

    {{{blarg}}}
    {{{blarg|default}}}
    {{{parameter1-a|{{{alias1-a|defaultValue}}}}}}
    

    as expected. You can then further process this parameter:

    local content = match:sub(4, -4) -- cut off curly braces
    local param, default = match:match"^([^|]+)|([^|]+)$"
    if not param then param = content end -- no default
    

    (I've simplified your pattern a bit here, this isn't exactly equivalent; it is more permissive)