Search code examples
regexhaskellcase-insensitive

case-insensitive regular expressions


What's the best way to use regular expressions with options (flags) in Haskell

I use

Text.Regex.PCRE

The documentation lists a few interesting options like compCaseless, compUTF8, ... But I don't know how to use them with (=~)


Solution

  • All the Text.Regex.* modules make heavy use of typeclasses, which are there for extensibility and "overloading"-like behavior, but make usage less obvious from just seeing types.

    Now, you've probably been started off from the basic =~ matcher.

    (=~) ::
      ( RegexMaker Regex CompOption ExecOption source
      , RegexContext Regex source1 target )
      => source1 -> source -> target
    (=~~) ::
      ( RegexMaker Regex CompOption ExecOption source
      , RegexContext Regex source1 target, Monad m )
      => source1 -> source -> m target
    

    To use =~, there must exist an instance of RegexMaker ... for the LHS, and RegexContext ... for the RHS and result.

    class RegexOptions regex compOpt execOpt | ...
          | regex -> compOpt execOpt
          , compOpt -> regex execOpt
          , execOpt -> regex compOpt
    class RegexOptions regex compOpt execOpt
          => RegexMaker regex compOpt execOpt source
             | regex -> compOpt execOpt
             , compOpt -> regex execOpt
             , execOpt -> regex compOpt
      where
        makeRegex :: source -> regex
        makeRegexOpts :: compOpt -> execOpt -> source -> regex
    

    A valid instance of all these classes (for example, regex=Regex, compOpt=CompOption, execOpt=ExecOption, and source=String) means it's possible to compile a regex with compOpt,execOpt options from some form source. (Also, given some regex type, there is exactly one compOpt,execOpt set that goes along with it. Lots of different source types are okay, though.)

    class Extract source
    class Extract source
          => RegexLike regex source
    class RegexLike regex source
          => RegexContext regex source target
      where
        match :: regex -> source -> target
        matchM :: Monad m => regex -> source -> m target
    

    A valid instance of all these classes (for example, regex=Regex, source=String, target=Bool) means it's possible to match a source and a regex to yield a target. (Other valid targets given these specific regex and source are Int, MatchResult String, MatchArray, etc.)

    Put these together and it's pretty obvious that =~ and =~~ are simply convenience functions

    source1 =~ source
      = match (makeRegex source) source1
    source1 =~~ source
      = matchM (makeRegex source) source1
    

    and also that =~ and =~~ leave no room to pass various options to makeRegexOpts.

    You could make your own

    (=~+) ::
       ( RegexMaker regex compOpt execOpt source
       , RegexContext regex source1 target )
       => source1 -> (source, compOpt, execOpt) -> target
    source1 =~+ (source, compOpt, execOpt)
      = match (makeRegexOpts compOpt execOpt source) source1
    (=~~+) ::
       ( RegexMaker regex compOpt execOpt source
       , RegexContext regex source1 target, Monad m )
       => source1 -> (source, compOpt, execOpt) -> m target
    source1 =~~+ (source, compOpt, execOpt)
      = matchM (makeRegexOpts compOpt execOpt source) source1
    

    which could be used like

    "string" =~+ ("regex", CompCaseless + compUTF8, execBlank) :: Bool
    

    or overwrite =~ and =~~ with methods which can accept options

    import Text.Regex.PCRE hiding ((=~), (=~~))
    
    class RegexSourceLike regex source
      where
        makeRegexWith source :: source -> regex
    instance RegexMaker regex compOpt execOpt source
             => RegexSourceLike regex source
      where
        makeRegexWith = makeRegex
    instance RegexMaker regex compOpt execOpt source
             => RegexSourceLike regex (source, compOpt, execOpt)
      where
        makeRegexWith (source, compOpt, execOpt)
          = makeRegexOpts compOpt execOpt source
    
    source1 =~ source
      = match (makeRegexWith source) source1
    source1 =~~ source
      = matchM (makeRegexWith source) source1
    

    or you could just use match, makeRegexOpts, etc. directly where needed.