I'm trying to write a regex in F# that will match things like this
.float -.05, 2.4
.float 31.1234
.float -0.5, 1.0, 1.1
I'm trying something like this
let matchFloat input =
let matches = Regex(@"(\.float )?(?<float>[+-]?\d*\.\d+)").Matches(input)
([for m in matches -> m.Groups.["float"].Value, matches.Count > 0)
Which kind of works, but I have the same thing for .double and whatever the first one in my match expression is will be the one that gets matched - since I do an "occurs 0 or 1 times", it means the strings of floating point numbers following either directive will be treated the same.
So how do I make sure the .float is there, without doing input.StartsWith(...)? I know there is way I can write this regex so that it will match appropriately, and m.Groups.["float"].Value will return only what I need without having remove spaces or commas after the fact.
I have been messing with this thing for hours and just can't get it to do what I want. I've tried using the lookbehind/lookahead stuff, and a few other things, but no luck.
Well, this gets you well on your way to step 1 of fixing a Linux machine
You can use positive lookbehind combined with alternation to capture either .float
or .decimal
at the start of the line into a group, then check which one was captured. The lookbehind itself does not contribute to the primary capture, so the numerical digits are still the only thing in "group 0".
Then my favorite tricksy bit - by adding a .*
within the lookbehind (after float
or decimal
), you can successfully return multiple matches from the input string, each sharing the initial .float
or .decimal
, but then each zooming forward to capture a different set of digits.
Putting a bow on it with a little DU type to represent the two cases:
type DataPoint =
| Float of string
| Decimal of string
let parse input =
let patt = "(?<=^\.((float)|(decimal)).*(,?\s+))[+-]?\d*\.\d+(?=\s*(,|$))"
Regex.Matches(input, patt)
|> Seq.cast<Match>
|> Seq.map (fun m ->
match (m.Groups.[2].Success, m.Groups.[3].Success) with
| (true, false) -> Float(m.Value)
| (false, true) -> Decimal(m.Value)
| _ -> failwith "??")
|> List.ofSeq
// positive cases
parse ".float -.05, 2.4" // [Float "-.05"; Float "2.4"]
parse ".float 31.1234" // [Float "31.1234"]
parse ".float -0.5, 1.0, 1.1" // [Float "-0.5"; Float "1.0"; Float "1.1"]
parse ".decimal 123.456, -22.0" // [Decimal "123.456"; Decimal "-22.0"]
// negative cases, plucks out valid bits
parse ".decimal xyz,,.., +1.0, .2.3.4, -.2 " // [Decimal "+1.0"; Decimal "-.2"]
parse ".float 1.0, 2.0-, 3." // [Float "1.0"]
Note that I've just relied on the group numbers, you might want to be more careful and used named groups.
Also worth noting that .NET is one of the only regex environments that supports full alternation and .*
matching within a lookbehind, so this might not be portable.
Edit: I hardened the pattern somewhat against negative input based on feedback. Still not bulletproof.