In an effort to understand the capabilities of functional programming I put together a few basic functions that you can compose together to build complex regular expressions. Now after some testing I have found this works but you can write some horrible code in any language that will work. Is this the kind of code you would find a professional F# programmer writing or am I abusing the feature?
Note: test
is specifically what I am referring to.
type State = { input:string; index:int; succeeded:bool }
type Matcher = State -> State
let term (cs:char Set) =
fun s ->
if s.succeeded && s.index < s.input.Length && cs.Contains s.input.[s.index] then
{ input = s.input; index = s.index + 1; succeeded = true }
else
{ input = s.input; index = s.index; succeeded = false }
let quantify (term, min, max) =
let rec inner (s:State, count) =
if s.succeeded && s.index < s.input.Length && count <= max then
inner (term { input = s.input; index = s.index + 1; succeeded = true }, count + 1)
elif count >= min && count <= max then
{ input = s.input; index = s.index - 1; succeeded = true }
else
s
fun s -> inner (s, 0)
let disjunction leftTerm rightTerm =
fun s ->
let left = leftTerm s
if not left.succeeded then
let right = rightTerm s
if not right.succeeded then
{ input = s.input; index = s.index; succeeded = false }
else
right
else
left
let matcher input terms =
let r = terms { input = input; index = 0; succeeded = true }
if r.succeeded then r.input.Substring (0, r.index) else null
let test = // (abc|xyz)a{2,3}bc
disjunction // (abc|xyz)
(term (set "a") >> term (set "b") >> term (set "c"))
(term (set "x") >> term (set "y") >> term (set "z"))
>> quantify (term (set "a"), 2, 3) // (a{2,3})
>> term (set "b") // b
>> term (set "c") // c
let main () : unit =
printfn "%s" (matcher "xyzaabc" test)
System.Console.ReadKey true |> ignore
main()
The code looks pretty good to me.
I'm not sure if this was your intention or a coincidence, but you're implementing something quite similar to "parser combinators", which is a topic of many academic papers :-). I think that Monadic Parser Combinators is quite readable (it has examples in Haskell, but you should be able to translate them to F#).
Regarding the function composition operator. I'm generally not a big fan of using the operator too much, because it often obfuscates the code. However, in your example it makes a good sense because you can easily imagine that >>
means "this group should be followed by that group", which is easy to interpret.
The only minor change that I would do is to choose some nice custom operator for the disjunction
operation and define a few more primitive operations, so that you can write for example this:
// Test against several terms in sequence
let sequence terms = (fun state -> terms |> Seq.fold (>>) state)
// Test for a substring
let substring s = sequence [ for c in s -> term (set [c]) ]
let test = // (abc|xyz)a{2,3}bc
( substring "abc" <|> substring "xyz" )
>> quantify 2 3 (term (set "a")) // (a{2,3})
>> substring "bc" // bc
This is more higher-level description, so it removes some of the >>
operators in favor of functions that are more descriptive (and encapsulate >>
). I also changed quantify
to take multiple arguments instead of a tripple (which is a minor change)
If you want to play with this further, then you can take a look at the article and try to write F# computation expression builder that would allow you to use parser { .. }
syntax.