stringlualua-patterns

How to create a word set/class in Lua pattern matching?


I am trying to create a word-set (class) instead of the char-set (class) in Lua.

For Example:

local text = "hello world, hi world, hola world"
print(string.find(text, "[^hello] world"))

In this example, the program will try to match any part of the string that doesn't start with h or e or l or o characters and has a space and world next to it. But I want to make a word-set similar to this that can match the entire word and find a part of the string that doesn't start with the word hello and has the space and world next to it.

What I've tried:

local text = "hello world, hi world, hola world"
print(string.find(text, "[^h][^e][^l][^l][^o] world"))

It didn't work for some reason.


Solution

  • I am trying to create a word-set (class) instead of the char-set (class) in Lua.

    This is not possible in the general case. Lua patterns operate at a character level: Quantifiers can only be applied to characters or character sets (and some special pattern items), but there exists no alternation, no "subexpressions" etc. Patterns don't have the expressive power required for this.

    local text = "hello world, hi world, hola world"
    print(string.find(text, "[^h][^e][^l][^l][^o] world"))
    

    what this pattern translates to is: "find world preceded by a space and 5 characters where each character may not be the respective character of hello world. This means all of the following won't match:

    • hi world: Only three characters before world
    • hxxxx world: First character is the same as the first character of hello
    • ... hola world: The l from hola is at the same position as the second l from hello

    To find world not preceded by hello I would combine multiple calls to string.find to search through the string, always looking for a preceding hello :

    -- str: Subject string to search
    -- needle: String to search for
    -- disallowed_prefix: String that may not immediately precede the needle
    -- plain: Disable pattern matching
    -- init: Start at a certain position
    local function string_find_prefix(str, needle, disallowed_prefix, plain, init)
        local needle_start, needle_end = str:find(needle, init or 1, plain)
        if not needle_start then return end -- needle not found
        local _, prefix_end = str:find(disallowed_prefix, init or 1, plain)
        -- needle may not be prefixed with the disallowed prefix
        if (not prefix_end) or needle_start > prefix_end + 1 then
            -- prefix not found or needle starts after prefix, return match
            return needle_start, needle_end
        end
        return string_find_prefix(str, needle, disallowed_prefix, plain, prefix_end + 2)
    end
    print(string_find_prefix("hello world, hi world, hola world", "world", "hello ")) -- 17 21: Inclusive indices of the `world` after `hi`
    

    See string.find (s, pattern [, init [, plain]]).