Search code examples
lualua-patterns

Match a word or whitespaces in Lua


(Sorry for my broken English)
What I'm trying to do is matching a word (with or without numbers and special characters) or whitespace characters (whitespaces, tabs, optional new lines) in a string in Lua. For example:

local my_string = "foo bar"
my_string:match(regex)    --> should return 'foo', ' ', 'bar'

my_string = "   123!@."     -- note: three whitespaces before '123!@.'
my_string:match(regex)    --> should return ' ', ' ', ' ', '123!@.'

Where regex is the Lua regular expression pattern I'm asking for. Of course I've done some research on Google, but I couldn't find anything useful. What I've got so far is [%s%S]+ and [%s+%S+] but it doesn't seem to work.

Any solution using the standart library, e.g. string.find, string.gmatch etc. is OK.


Solution

  • Match returns either captures or the whole match, your patterns do not define those. [%s%S]+ matches "(space or not space) multiple times more than once", basically - everything. [%s+%S+] is plain wrong, the character class [ ] is a set of single character members, it does not treat sequences of characters in any other way ("[cat]" matches "c" or "a"), nor it cares about +. The [%s+%S+] is probably "(a space or plus or not space or plus) single character"

    The first example 'foo', ' ', 'bar' could be solved by:

    regex="(%S+)(%s)(%S+)"
    

    If you want a variable number of captures you are going to need the gmatch iterator:

    local capt={}
    for q,w,e in my_string:gmatch("(%s*)(%S+)(%s*)") do
      if q and #q>0 then
        table.insert(capt,q)
      end
      table.insert(capt,w)
      if e and #e>0 then
        table.insert(capt,e)
      end
    end
    

    This will not however detect the leading spaces or discern between a single space and several, you'll need to add those checks to the match result processing.

    Lua standard patterns are simplistic, if you are going to need more intricate matching, you might want to have a look at lua lpeg library.