Search code examples
lualpeg

Case-insensitive matching in LPeg.re (Lua)


I'm new to the "LPeg" and "re" modules of Lua, currently I want to write a pattern based on following rules:

  1. Match the string that starts with "gv_$/gv$/v$/v_$/x$/xv$/dba_/all_/cdb_", and the prefix "SYS.%s*" or "PUBLIC.%s*" is optional
  2. The string should not follow a alphanumeric, i.e., the pattern would not match "XSYS.DBA_OBJECTS" because it follows "X"
  3. The pattern is case-insensitive

For example, below strings should match the pattern:

,sys.dba_objects,       --should return  "sys.dba_objects"    
SyS.Dba_OBJECTS
cdb_objects
dba_hist_snapshot)      --should return  "dba_hist_snapshot"   

Currently my pattern is below which can only match non-alphanumeric+string in upper case :

p=re.compile[[
         pattern <- %W {owner* name}
         owner   <- 'SYS.'/ 'PUBLIC.'
         name    <- {prefix %a%a (%w/"_"/"$"/"#")+}
         prefix  <- "GV_$"/"GV$"/"V_$"/"V$"/"DBA_"/"ALL_"/"CDB_"
      ]]
print(p:match(",SYS.DBA_OBJECTS")) 

My questions are:

  1. How to achieve the case-insensitive matching? There are some topics about the solution but I'm too new to understand
  2. How to exactly return the matched string only, instead of also have to plus %W? Something like "(?=...)" in Java

Highly appreciated if you can provide the pattern or related function.


Solution

  • You can try to tweak this grammar

    local re = require're'
    
    local p = re.compile[[
        pattern <- ((s? { <name> }) / s / .)* !.
        name    <- (<owner> s? '.' s?)? <prefix> <ident>
        owner   <- (S Y S) / (P U B L I C)
        prefix  <- (G V '_'? '$') / (V '_'? '$') / (D B A '_') / (C D B '_')
        ident   <- [_$#%w]+
        s       <- (<comment> / %s)+
        comment <- '--' (!%nl .)*
        A       <- [aA]
        B       <- [bB]
        C       <- [cC]
        D       <- [dD]
        G       <- [gG]
        I       <- [iI]
        L       <- [lL]
        P       <- [pP]
        S       <- [sS]
        U       <- [uU]
        V       <- [vV]
        Y       <- [yY]
        ]]
    local m = { p:match[[
    ,sys.dba_objects,       --should return  "sys.dba_objects"
    SyS.Dba_OBJECTS
    cdb_objects
    dba_hist_snapshot)      --should return  "dba_hist_snapshot"
    ]] }
    print(unpack(m))
    

    . . . prints match table m:

    sys.dba_objects SyS.Dba_OBJECTS cdb_objects     dba_hist_snapshot
    

    Note that case-insensitivity is quite hard to achieve out of the lexer so each letter has to get a separate rule -- you'll need more of these eventually.

    This grammar is taking care of the comments in your sample and skips them along with whitespace so matches after "should return" are not present in output.

    You can fiddle with prefix and ident rules to specify additional prefixes and allowed characters in object names.

    Note: !. means end-of-file. !%nl means "not end-of-line". ! p and & p are constructing non-consuming patterns i.e. current input pointer is not incremented on match (input is only tested).

    Note 2: print-ing with unpack is a gross hack.

    Note 3: Here is a tracable LPeg re that can be used to debug grammars. Pass true for 3-rd param of re.compile to get execution trace with test/match/skip action on each rule and position visited.