Search code examples
pythonparsingmatrixply

Python PLY Parser - Parsing Matrix as List of Lists


I'm creating a calculator using PLY and I want to be able to parse matrices like this one : [[11,1];[22,4];[13,3]] into a list of lists to be given to my own Matrix class for further calculation.

This is my code so far. The three important functions here are p_comma, p_semicolonand p_brack. The rest is purely for calculation and priority.

def p_operations(p): 
    """ expression : sixth
    sixth : fifth
    fifth : fourth
    fourth : third
    third : second
    second : first
    first : NUMBER
    first : IMAGINE
    """
    p[0] = p[1]

def p_comma(p):
    """ sixth : sixth ',' fifth """
    if isinstance(p[1], list):
        p[1].append(p[3])
        p[0] = p[1]
    else:
        p[0] = [p[1],p[3]]

def p_semicolon(p):
    """ sixth : sixth ';' fifth """
    if isinstance(p[1], list):
        p[1].append(p[3])
        p[0] = p[1]
    else:
        p[0] = [p[1],p[3]]

def p_plus(p):
    """ fifth : fifth '+' fourth """
    p[0] = p[1] + p[3]

def p_minus(p):
    """ fifth : fifth '-' fourth """
    p[0] = p[1] - p[3]

def p_implicit_times(p):
    """ fourth : fourth second """
    p[0] = p[1] * p[2]

def p_times(p):
    """ fourth : fourth '*' third """
    p[0] = p[1] * p[3]

def p_divide(p):
    """ fourth : fourth '/' third """
    p[0] = p[1] / p[3]

def p_modulo(p):
    """ fourth : fourth '%' third """
    p[0] = p[1] % p[3]

def p_floor_divide(p):
    """ fourth : fourth FLOORDIV third """
    p[0] = p[1] // p[3]

def p_unary_minus(p):
    """ third : '-' third """
    p[0] = -p[2]

def p_power(p):
    """ second : first '^' third """
    p[0] = p[1] ** p[3]

def p_paren(p):
    """ first : '(' expression ')' """
    p[0] = p[2]

def p_brack(p):
    """ first : '[' expression ']' """
    if type(p[2][0]) == list:
        p[0] = [p[2]]
    else:
        p[0] = Matrix.Matrix(p[2])

The problem here is that my solution doesn't work well with some tricky stuff like this : [[1]]and also the parsing is working even when there is no brackets, which is not what I want.

Most of all, I strongly believe that there is a way better solution to be found.

Can someone help me with that ?


Solution

  • Not everything is an expression :-)

    In particular, what goes inside matrix brackets is a list of rows separated by semicolons, and each row is a list of expressions separated by commas. Your grammar should reflect that simple fact, rather than just lumping these lists into the expression non-terminal. Otherwise, you will find that it accepts semicolon- and comma-separated lists out of context. Which I suppose is the basis of your question.

    Also, as I think we've already discussed, if your action function needs to do a test, it probably indicates that you are not taking advantage of your grammar. And that is indeed the case here.

    So let's start at the top. A matrix is a semicolon-separated list of rows enclosed in square brackets. In other words:

    matrix     : '[' row_list ']'
    row_list   : row
               | row_list ';' row
    

    A row is a comma-separated list of values (expressions, for now), enclosed in square brackets:

    row        : '[' value_list ']'
    value_list : expression
               | value_list ',' expression
    

    Now, we can write action functions. These are pretty simple, too.

    def p_list_first(p):
        """value_list : expression
           row_list   : row
        """
        p[0] = [ p[1] ]
    
    
    def p_list_extend(p):
        """value_list : value_list ','  expression
           row_list   : row_list ';' row
        """
        p[0] = p[1]
        p[0].append(p[3])
        # Another way of writing this action:
        #     p[0] = p[1] + [ p[3] ]
        # That's cleaner, in that it doesn't modify the previous value.
        # But it's less efficient because it creates a new list every time.
    
    def p_row(p):
        """row       : '[' value_list ']' """
        p[0] = p[2]
    
    def p_matrix(p):
        """matrix    : '[' row_list ']' """
        p[0] = Matrix.Matrix(p[2])
    

    That replaces your comma, semicolon, brack and sixth rules. The only thing left is to add first : matrix. (Also, the p_row action is identical to your p_paren action, so they could be combined, if you like.)

    Two take-aways:

    1. if you can describe the syntax in your own native language, you can write a grammar for it. The grammar is just a more formal way of saying the same thing. (At least, once you stop being intimidated by recursion, which isn't complicated either: "a list is a value or you can extend a list by adding a comma and a value" should be pretty easy to understand.)

    2. A grammar should be able to parse an input without having to test what was parsed before.

    This second principle is not absolute. In this case, for example, you might want to forbid the values in a row from being nested matrix elements. You could write that as a grammar, but it would involve an annoying amount of duplication; it might actually be simpler for the row_list actions to verify that the expression they are handling is not a Matrix.