What's the best way to use regular expressions with options (flags) in Haskell
I use
Text.Regex.PCRE
The documentation lists a few interesting options like compCaseless, compUTF8, ... But I don't know how to use them with (=~)
All the Text.Regex.*
modules make heavy use of typeclasses, which are there for extensibility and "overloading"-like behavior, but make usage less obvious from just seeing types.
Now, you've probably been started off from the basic =~
matcher.
(=~) ::
( RegexMaker Regex CompOption ExecOption source
, RegexContext Regex source1 target )
=> source1 -> source -> target
(=~~) ::
( RegexMaker Regex CompOption ExecOption source
, RegexContext Regex source1 target, Monad m )
=> source1 -> source -> m target
To use =~
, there must exist an instance of RegexMaker ...
for the LHS, and RegexContext ...
for the RHS and result.
class RegexOptions regex compOpt execOpt | ...
| regex -> compOpt execOpt
, compOpt -> regex execOpt
, execOpt -> regex compOpt
class RegexOptions regex compOpt execOpt
=> RegexMaker regex compOpt execOpt source
| regex -> compOpt execOpt
, compOpt -> regex execOpt
, execOpt -> regex compOpt
where
makeRegex :: source -> regex
makeRegexOpts :: compOpt -> execOpt -> source -> regex
A valid instance of all these classes (for example, regex=Regex
, compOpt=CompOption
, execOpt=ExecOption
, and source=String
) means it's possible to compile a regex
with compOpt,execOpt
options from some form source
. (Also, given some regex
type, there is exactly one compOpt,execOpt
set that goes along with it. Lots of different source
types are okay, though.)
class Extract source
class Extract source
=> RegexLike regex source
class RegexLike regex source
=> RegexContext regex source target
where
match :: regex -> source -> target
matchM :: Monad m => regex -> source -> m target
A valid instance of all these classes (for example, regex=Regex
, source=String
, target=Bool
) means it's possible to match a source
and a regex
to yield a target
. (Other valid target
s given these specific regex
and source
are Int
, MatchResult String
, MatchArray
, etc.)
Put these together and it's pretty obvious that =~
and =~~
are simply convenience functions
source1 =~ source
= match (makeRegex source) source1
source1 =~~ source
= matchM (makeRegex source) source1
and also that =~
and =~~
leave no room to pass various options to makeRegexOpts
.
You could make your own
(=~+) ::
( RegexMaker regex compOpt execOpt source
, RegexContext regex source1 target )
=> source1 -> (source, compOpt, execOpt) -> target
source1 =~+ (source, compOpt, execOpt)
= match (makeRegexOpts compOpt execOpt source) source1
(=~~+) ::
( RegexMaker regex compOpt execOpt source
, RegexContext regex source1 target, Monad m )
=> source1 -> (source, compOpt, execOpt) -> m target
source1 =~~+ (source, compOpt, execOpt)
= matchM (makeRegexOpts compOpt execOpt source) source1
which could be used like
"string" =~+ ("regex", CompCaseless + compUTF8, execBlank) :: Bool
or overwrite =~
and =~~
with methods which can accept options
import Text.Regex.PCRE hiding ((=~), (=~~))
class RegexSourceLike regex source
where
makeRegexWith source :: source -> regex
instance RegexMaker regex compOpt execOpt source
=> RegexSourceLike regex source
where
makeRegexWith = makeRegex
instance RegexMaker regex compOpt execOpt source
=> RegexSourceLike regex (source, compOpt, execOpt)
where
makeRegexWith (source, compOpt, execOpt)
= makeRegexOpts compOpt execOpt source
source1 =~ source
= match (makeRegexWith source) source1
source1 =~~ source
= matchM (makeRegexWith source) source1
or you could just use match
, makeRegexOpts
, etc. directly where needed.