Search code examples
pegjs

How to parse string without any surround with pegjs?


There is a text:

this is title. {this is id} [this is type] (this is description)

I want to get the follow object

{
    id: 'this is id',
    title: 'this is title',
    type: 'this is type',
    description: 'this is description',
}

this is my pegjs rules:

start = Text

_ "whitespace"
  = [ \t\n\r]*

Text
    = _ body:Element? _ {
        return {
            type: "Text",
            body: body || [],
        }
    }

Element
    = _ title:Title _ id:Id _ type:Type _ description:Description _ {
        return {
            id: id,
            title: title,
            type: type,
            description: description,
        }
    }

Type
    = "[" type: Literal "]" {
        return type;
    }

Id
    = '{' id: Literal '}' {
        return id;
    }

Title
    = Literal

Description
    = "(" description: Literal ")" {
        return description;
    }

Literal "Literal"
    = '"' char:DoubleStringCharacter* '"' {
        return char.join("");
    }

DoubleStringCharacter
    = !'"' . {
        return text();
    }

There is the question, I don't know how to match the string without any surround syntax?

I only know the Literal grammar is wrong, but I don't know how to improve it, can anyone give me some help?


Solution

  • Your Literal rule accepts quoted strings, what you can do is when you are parsing id, you match everything until you find a }, when you parse type you match everything until you see a ], when you parse description you match everything until you see ), and when parsing the title you match everything until you see a . then your rule Element will produce the result you want.

    start = Text
    
    _ "whitespace"
      = [ \t\n\r]*
    
    Text
        = _ body:Element? _ {
            return {
                type: "Text",
                body: body || [],
            }
        }
    
    Element
        = _ title:Title _ id:Id _ type:Type _ description:Description _ {
            return {
                id: id,
                title: title,
                type: type,
                description: description,
            }
        }
    
    Type
        = "[" type: $[^\]]* "]" {
            return type;
        }
    
    Id
        = '{' id: $[^}]* '}' {
            return id;
        }
    
    Title
        = s:$[^.]* '.' _ {return s}
    
    Description
        = "(" description: $[^)]* ")" {
            return description;
        }