Search code examples
parsingf#lexical-analysisfsyaccfslex

How to capture a string without quote characters


I'm trying to capture quoted strings without the quotes. I have this terminal

%token <string> STRING

and this production

constant:
    | QUOTE STRING QUOTE { String($2) }

along with these lexer rules

| '\''       { QUOTE }
| [^ '\'']*  { STRING (lexeme lexbuf) } //final regex before eof

It seems to be interpreting everything leading up to a QUOTE as a single lexeme, which doesn't parse. So maybe my problem is elsewhere in the grammar--not sure. Am I going about this the right way? It was parsing fine before I tried to exclude quotes from strings.

Update

I think there may be some ambiguity with the following lexer rules

let name = alpha (alpha | digit | '_')*
let identifier = name ('.' name)*

The following rule is prior to STRING

| identifier    { ID (lexeme lexbuf) }

Is there any way to disambiguate these without including quotes in the STRING regex?


Solution

  • It's pretty normal to do semantic analysis in the lexer for constants like strings and numeric literals, so you might consider a lex rule for your string constants like

    | '\'' [^ '\'']* '\'' 
        { STRING (let s = lexeme lexbuf in s.Substring(1, s.Length - 2)) }