Search code examples
pegtextx

How to limit repetitions in textx grammar?


I'm trying to create a grammar in textx.

The syntax should look like this:

name_a
name_a, name_b
name_a, name_b: name_c, *
name_a, name_b: name_c, *, name_d
*, name_d
*

Where asterisk (*) means "all". I want it not to repeat. Current grammar is this:

Subsets: ColumnsSet*;
ColumnsSet: SetItem (',' ColumnsSet)*;
SetItem: ColumnName | Star;
Star: '*';
ColumnName: name=ID (':' rename=ID)*;

It allows repetition of asterisk. I want to prevent such case, so that these lines are invalid:

name_a, *, *
name_a, *, name_b, *
*, name_a, *

How should I rewrite the grammar?

Is there a way to flatten the output of the nested rule: ColumnsSet: SetItem (',' ColumnsSet)*;?


Solution

  • There are several issues with your grammar. First in textX you need to use assignments to collect relevant data. Use *=, += style of assignment to denote many zero/one-or-more. Use separator repetition modifiers to avoid Something (',' Something)* boilerplate.

    To prevent * happening multiple time you can register object processor which can check for semantic errors.

    Also, to ensure that language is line oriented you may need to look into noskipws rule modifier.

    textX is not just a parser, from the grammar it deduces meta-model of the language which you can visualize

    Here is one (probably non-complete) solution which can be a good start.

    from textx import metamodel_from_str, TextXSemanticError
    from textx.scoping.tools import get_location
    
    grammar = r'''
        Subsets: col_sets+=ColumnsSet;
        ColumnsSet: set_items+=SetItem[','];
        SetItem: ColumnName | Star;
        Star: '*';
        ColumnName: name=ID (':' rename=ID)?;
    '''
    
    def column_set_proc(cs):
        if len([x for x in cs.set_items if x == '*']) > 1:
            raise TextXSemanticError('Cannot use multiple * in a single line', **get_location(cs))
    
    mm = metamodel_from_str(grammar)
    mm.register_obj_processors({'ColumnsSet': column_set_proc})
    
    # This will pass
    model = mm.model_from_str(r'''name_a
    name_a, name_b
    name_a, name_b: name_c, *
    name_a, name_b: name_c, *, name_d
    *, name_d
    *
    ''')
    
    # Each of these raise TextXSemanticError
    count = 0
    for invalid in ['name_a, *, *', ' name_a, *, name_b, *', ' *, name_a, *']:
        try:
            mm.model_from_str(invalid)
        except TextXSemanticError:
            count += 1
    
    assert count == 3