Search code examples
pythondsltextx

How to write a textx grammar rule to detect standard datatypes without modifying them?


I want to write a textx grammar rule that can consists of either another defined rule or any kind of standard datatype (Int, Float, String etc.).

This is for a simple textx DSL which should have the possibility included to write (and translate in the end) conditions that can consist either of other grammar rules (like predefined functions) or of any kinds of standard predefined datatypes (String/Int/Float/Bool/ID).

So, I actually want to be able to write something like for example

condition insert input data 5 equal 10 BEGIN
    ...
END

This stands for a normal IF. The insert input data 5 is a rule that gets translated later into a normal function call insertOutputData(5). The grammar I use there:

Model: commands*=Command;
Command: Function | Branch;
Function: Func_InsertInputData | Func_InsertOutputData;
Func_InsertInputData: 'insert input data' index=INT;
Func_InsertOutputData: 'insert output data' index=INT;
Branch: 'condition' condition=Condition 'BEGIN'
    commands*=Command;
'END'
Condition: Cond_Equal | Cond_And | Cond_False;
Cond_Equal: op1=Operand 'equal' op2=Operand;
Cond_And: op1=Operand 'and' op2=Operand;
Cond_False: op1=Operand 'is false';
Operand: Function | OR_ANY_OTHER_KIND_OF_DATA;

In the interpreter, I try to read the code by doing this:

def translateCommands(cmds):
    commands = []
    for cmd in cmds:
        commands.append(translateCommand(cmd))
    return commands

def translateCommand(cmd):
    print(cmd)
    print(cmd.__class__)
    if cmd.__class__.__name__ == 'int' or cmd.__class__.__name__ == 'float':
        return str(cmd)
    elif cmd.__class__.__name__ == 'str':
        return '\'' + cmd + '\''
    elif(cmd.__class__.__name__ == 'Branch'):
        s = ''
        if(cmd.condition.__class__.__name__ ==  'Cond_Equal'):
            s = 'if ' + translateCommand(cmd.condition.op1) + '==' + translateCommand(cmd.condition.op2) + ':'
        if(cmd.condition.__class__.__name__ == 'Cond_And'):
            s = 'if ' + translateCommand(cmd.condition.op1) + 'and' + translateCommand(cmd.condition.op2) + ':'
        # ...
        commandsInBlock = translateCommands(cmd.commands)
        for command in commandsInBlock:
            s += '\n    '+command
        return s

At OR ANY OTHER KIND OF DATA, I tried it with listing the actual datatypes but this does not work. If I process the model with the DSL code shown above with Function | FLOAT | INT | BOOL | ID | STRING as Operand rule, the integers (the 10 after the equal in the example) get converted into floats

if insertInputData(5)==10.0:

If I process the model with the Operand rule like Function | INT | FLOAT | BOOL | ID | STRING, I get an error

textx.exceptions.TextXSyntaxError: None:13:43: error: Expected 'BEGIN' at position (13, 43) => 't equal 10*.0 BEGIN  '.

The result I would like to see is

if insertInputData(5)==10:

or

if insertInputData(5)==10.0:

with

condition insert input data 5 equal 10.0 BEGIN
    ...
END

but textx seems to always try to convert the value it gets at that position into the suggested type in the Operand rule which is bad in this case. How do I have to modify my rule, so that it detecs every data type appropriately without modifying anything?

EDIT 1

Igor Dejanović just described the problem and I followed the approach he gave.

grammar (the relevant part):

Command: Function | Branch | MyNumber;
#...
Oparand: Function | MyNumber | BOOL | ID | STRING;
MyNumber: STRICTFLOAT | INT;
STRICTFLOAT: /[+-]?(((\d+\.(\d*)?|\.\d+)([eE][+-]?\d+)?)|((\d+)([eE][+-]?\d+)))(?<=[\w\.])(?![\w\.])/;

code:

mm = metamodel_from_str(grammar)
mm.register_obj_processors({'STRICTFLOAT': lambda x: float(x)})

dsl_code = '''
10
10.5
'''
model = mm.model_from_str(dsl_code)
commands = iterateThroughCommands(model.commands)

This results in

10
<class 'int'>

'10.5'
<class 'str'>

so, there is something missing to make the object processor work...


Solution

  • The problem is that each valid integer can be interpreted as FLOAT so if you order your rules as FLOAT | INT |... you get a float type as the FLOAT rule will match but if you order rules as INT | FLOAT|... for float number the parser will consume the part of the number until . and than the parsing won't continue.

    This is resolved in the development version of textX (please see CHANGELOG.md) by introducing STRICTFLOAT rule which will never match integer and the built-in NUMBER rule is changed to first try to match STRICTFLOAT and then INT.

    The next release will be 2.0.0 and is about to happen in the next few weeks I hope. In the mean time you can either install directly from github or modify your grammar to have something like this:

    MyNumber: STRICTFLOAT | INT;
    STRICTFLOAT: /[+-]?(((\d+\.(\d*)?|\.\d+)([eE][+-]?\d+)?)|((\d+)([eE][+-]?\d+)))(?<=[\w\.])(?![\w\.])/;   // or the float format you prefer
    

    And register object processor for your STRICTFLOAT type that will convert to Python float. After upgrade to textX 2.0.0 you should just replace references to MyNumber with NUMBER in the grammar.

    More information can be found in the reported issue

    EDIT 1:

    The proposed solution is not working at the moment due to the bug reported here

    EDIT 2:

    The bug is fixed in the development version. Until the 2.0.0 is released you have to

    pip install https://github.com/textX/textX/archive/master.zip
    

    and then you don't need the workaround at all in case you don't want to change the default types.