I want to input a string and then want to see whether it matches with a certain regex or not; if not I want to move on to another regex till all my regexs are exhausted. For eg.,suppose I have following 3 regexs
Now suppose that the desired string is :
- val str_input="7569"
I want to check str_input first with regex_1;if it does not match then try with regex_2;if it does not match then finally try with regex_3. The problem is that how to use SMLNJ for this purpose. Thank you.
You can achieve what you want using the regular expression library that SML/NJ provides. Its documentation can be found here: http://www.smlnj.org/doc/smlnj-lib/Manual/regexp-lib-part.html
As a small getting started example, here's what you need to do. First you need to tell SML/NJ that you want to use the regexp library. You can accomplish that using a .cm
file (cm comes from Compilation Manager and it's sort of a Makefile for SML/NJ):
group is
$/basis.cm (* Load standard functions and modules. *)
$/regexp-lib.cm (* Load the regexp library. *)
main.sml (* Load our own source file. *)
Now we can use the regexp library. Unfortunately, it's not really straightforward, because it makes use of functors and readers, but basically, what you need is the RE.match
function, which accepts a list of pairs, where the first element is a regex and the second is a function that's called when that regexp is matched. Using this list of pairs, the RE.match
function will walk the input string until it finds a match, at which point it will call the function associated with the regexp that matched at that point. The result of that function is the result of the whole RE.match
call.
structure Main =
struct
(**
* RE is a module created by calling the module-level function (functor)
* RegExpFn (Fn comes from functor), with two module arguments.
*
* The first argument, called P, is the syntax used to write regular
* expressions in. In this particular case, it's the Awk syntax, which
* is the only syntax provided by SML/NJ right now.
*
* The second argument, called E, is the RegExp engine used behind the
* scenes to compile and execute the syntax. In this particular case
* I've opted for ThompsonEngine, which implements Ken Thompson's
* matching algorithm. Other options are BackTrackEngine and DfaEngine.
*)
structure RE = RegExpFn(
structure P = AwkSyntax
structure E = ThompsonEngine
(* structure E = BackTrackEngine *)
(* structure E = DfaEngine *)
)
fun main () =
let
(**
* A list of (regexp, match function) pairs. The function called by
* RE.match is the one associated with the regexp that matched.
*
* The match parameter is described here:
* http://www.smlnj.org/doc/smlnj-lib/Manual/match-tree.html
*)
val regexes = [
("[a-zA-Z]*", fn match => ("1st", match)),
("[0-9]*", fn match => ("2nd", match)),
("1tom|2jerry", fn match => ("3rd", match))
]
val input = "7569"
in
(**
* StringCvt.scanString will traverse the `input` string and apply
* the result of `RE.match regexes` to each character in the string.
*
* It's sort of a streaming matching process. The end result, however,
* depends on your implementation above, in the match functions.
*)
StringCvt.scanString (RE.match regexes) input
end
end
You can now use it like this from the command line:
$ sml sources.cm
Standard ML of New Jersey v110.79 [built: Sun Jan 3 23:12:46 2016]
[scanning sources.cm]
[library $/regexp-lib.cm is stable]
[parsing (sources.cm):main.sml]
[library $SMLNJ-BASIS/basis.cm is stable]
[library $SMLNJ-BASIS/(basis.cm):basis-common.cm is stable]
- Main.main ();
[autoloading]
[autoloading done]
val it = SOME ("2nd",Match ({len=4,pos=0},[]))
: (string * StringCvt.cs Main.RE.match) option