Search code examples
lualua-patterns

How to match a sentence in Lua


I am trying to create a regex which attempts to match a sentence.

Here is a snippet.

local utf8 = require 'lua-utf8'
function matchsent(text)
  local text = text
  for sent in utf8.gmatch(text, "[^\r\n]+\.[\r\n ]") do
    print(sent)
    print('-----')
  end
end

However, it does not work like in python for example. I know that Lua uses different set of regex patterns and it's regex capabilities are limited but why does the regex above give me a syntax error? And how a sentence matching regex in Lua would look like?


Solution

  • Note that Lua uses Lua patterns, that are not "regular" expressions as they cannot match a regular language. They can hardly be used to split a text into sentences since you'd need to account for various abbreviations, spacing, case etc. To split a text into sentences, you need an NLP package rather than one or two regexps due to the complexity of the task.

    Regarding

    why does the regex above give me a syntax error?

    you need to escape special symbols with a % symbol in Lua patterns. See an example code:

    function matchsent(text)
        for sent in string.gmatch(text, '[^\r\n]+%.[\r\n ]') do
            print(sent)
            print("---")
        end
    end
    matchsent("Some text here.\nShow me")
    

    An online demo