Search code examples
f#fparsec

Advice on FParsec for handling whitespace


I have the following subexpression to parse 'quotes' which have the following format

"5.75 @ 5.95"

I therefore have this parsec expression to parse it

let pquote x = (sepBy (pfloat) ((spaces .>> (pchar '/' <|>  pchar '@' )>>. spaces))) x

It works fine.. except when there is a trailing space in my input, as the separator expression starts to consume content.So I wrapped it around an attempt, which works and seems, from what I understand, more or less what this was meant to be.

let pquote x = (sepBy (pfloat) (attempt (spaces .>> (pchar '/' <|>  pchar '@' )>>. spaces))) x

As I dont know fparsec so well, I wonder if there are any better way to write this. it seems a bit heavy (while still being very manageable of course)


Solution

  • let s1 = "5.75         @             5.95              "
    let s2 = "5.75/5.95   "
    let pquote: Parser<_> =
        pfloat
        .>> spaces .>> skipAnyOf ['@'; '/'] .>> spaces
        .>>. pfloat
        .>> spaces
    

    Notes:

    1. I've made spaces optional everywhere spaces skips any sequence of zero or more whitespaces, so there's no need to use opt - thanks @Daniel;
    2. type Parser<'t> = Parser<'t, UserState> - I define it this way in order to avoid "value restriction" error; you may remove it;
    3. Also, don't forget the following if your program may run on a system with default language settings having decimal comma: System.Threading.Thread.CurrentThread.CurrentCulture <- Globalization.CultureInfo.GetCultureInfo "en-US" this won't work, thanks @Stephan
    4. I would not use sepBy unless I have a value list of unknown size.
    5. If you don't really need the value returned (e.g. '@' characters), it is recommended to use skip* functions instead p* for performance considerations.

    UPD added slash as separator