Search code examples
textx

How to parse keywords and strings from a line of text


Have a file keywords.tx with

Commands:
    keywords = 'this' & 'way'
;
StartWords:
    keywords = 'bag'
;

Then a file mygram.tx with

import keywords

MyModel:
    keyword*=StartWords[' ']
    name+=Word[' ']
;
Word:
    text=STRING
;

'''

My data file has one line with "bag hello soda this way". Would like to see result have attributes of keyword='bag' name='hello soda' and command='this way'.

Not sure how to get grammar to handle: keywords words keywords making sure that 2nd keywords are not included in the words. Another way to express is startwords words commands


Solution

  • If I understood your goal you can do something like this:

    from textx import metamodel_from_str
    
    mm = metamodel_from_str('''
    File:
        lines+=Line;
    
    Line:
        start=StartWord
        words+=Word
        command=Command;
    
    StartWord:
        'bag' | 'something';
    
    Command:
        'this way' | 'that way';
    
    Word:
        !Command ID;
    ''')
    
    input = '''
    bag hello soda this way
    bag hello soda that way
    something hello this foo this way
    '''
    
    model = mm.model_from_str(input)
    
    assert len(model.lines) == 3
    l = model.lines[1]
    assert l.start == 'bag'
    assert l.words == ['hello', 'soda']
    assert l.command == 'that way'
    
    

    There are several things to note:

    • You don't have to specify [' '] as a separator rule in your repetitions as by default whitespaces are skipped,
    • To specify alternatives use |,
    • You can use a syntactic predicate ! to check if something is ahead and proceed only if it isn't. In the rule Word this is used to assure that commands are not consumed by the Word repetition in the Line rule.
    • You can add more start words and commands simply by adding more alternatives to these rules,
    • If you want to be more permissive and capture commands even if user specified multiple whitespaces between command words (e.g. this way) you can either use regex matches or e.g. specify match like:
    Command:
        'this ' 'way' | 'that ' 'way';
    

    which will match a single space as a part of this and than arbitrary number of whitespaces before way which will be thrown away.

    There is a comprehensive documentation with examples on the textX site so I suggest to take a look and go through some of the provided examples.