Search code examples
pythonply

Lex of ply is not counting enters


Im trying to do a program that counts some things of a C program, the problem I have, is that Im trying to count lines with:

def t_newline(t):
    r'\n+'
    t.lexer.lineno += len(t.value)

It doesnt count me the lines, here is an example of the input and the output:

for
if
else
switch
exit
Number of if´s: 1
Number of for´s: 1
Number of While´s: 0
Number of else´s: 1
Number of switche´s: 1
Number of lines: 1

But everytime I press enter to write a new line of code it doesnt get counted, also if I press enter without writing anything, this error appears:

Traceback (most recent call last): File "C:/Users/User/PycharmProjects/practicas/firma_digital.py", line 80, in if tok.type is not None: AttributeError: 'NoneType' object has no attribute 'type'

Here is all my code:

import ply.lex as lex
import ply.yacc as yacc
FinishProgram=0
Enters=0
Fors=0
Whiles=0
ifs=0
elses=0
Switches=0

reserved = {
   'if' : 'IF',
   'for' : 'FOR',
   'while': 'WHILE',
   'else': 'ELSE',
   'switch': 'SWITCH'
}
tokens = [
    'ID',
    'COLON',
    'SEMICOLON',

    ]+ list(reserved.values()) #Palabras reservadas

t_COLON= r','
t_SEMICOLON=r';'


def t_ID(t):
    r'[a-zA-Z_][a-zA-Z0-9_]*'
    t.type = reserved.get(t.value, 'ID')
    return t

t_ignore=r' '

def t_newline(t):
    r'\n+'
    t.lexer.lineno += len(t.value)

def t_error(t):
    print("This thing failed")
    t.lexer.skip(1)

lexer=lex.lex()


#def p_gram_sets(p):
 #   '''

  #  gram : SETS SEMICOLON
   #      | empty
    #'''
    #if p[1]:
     #   print(p[1])
      #  print("SETS")



def p_empty(p):
    '''
    empty :
    '''
    p[0]=None





def p_error(p):
    print("Syntax error in input!")


parser=yacc.yacc()

while FinishProgram==0:
    s=input('')
    lexer.input(s)
    tok = lexer.token()

    if tok.type is not None:
        if tok.type=='IF':
            ifs+=1
        elif tok.type=='FOR':
            Fors+=1
        elif tok.type=='WHILE':
            Whiles+=1
        elif tok.type=='ELSE':
            elses+=1
        elif tok.type=='SWITCH':
            Switches+=1

    #parser.parse(s)
    if "exit" in s:
        print("Number of if´s: "+ str(ifs) + "\n"+"Number of for´s: "+str(Fors)+"\n"+"Number of While´s: "+str(Whiles)+"\n"+"Number of else´s: "+str(elses)+"\n"+"Number of switche´s: "+str(Switches)+"\n"+"Number of lines: "+str(tok.lineno))
        FinishProgram=1

Solution

  • It's not that ply is not counting the newline characters. It's never seeing them, because you call it repeatedly using input().

    From the Python docs (emphasis added):

    input([prompt])

    If the prompt argument is present, it is written to standard output without a trailing newline. The function then reads a line from input, converts it to a string (stripping a trailing newline), and returns that.

    The normal usage of lex.lex is to

    Additionally, you are printing

    ... + str(tok.lineno)
    

    rather than

    ... + str(lexer.lineno)
    

    After the last token is tokenised, lex.lex returns None, so you can expect tok to be Null when your loop terminates, and therefore it is an error to try to extract it's lineno attribute. (However, in your case it only happens if the line you just tried to tokenise was empty, because you only use the first token on each line.) You want the line count recorded in the lexer object, which is the count you update in your action.

    If you want to work on an entire file (which is the usual case for parsers, other than line-by-line calculators), you need to read the entire contents of the file (or stdin, as the case may be). For non-interactive use, you would generally do that with the file object's read function. If you wanted to test your lexer, you would then use the fact that the lex function implements Python's iteration protocol, so it will work in a for statement. So your main loop would be something like:

    import sys
    lexer.input(sys.stdin.read())
    for tok in lexer:
      # Update counts
    

    and you would terminate the input by typing an end-of-file character at the beginning of a line (control-D on Linux or control-Z on Windows).

    Personally, I would implement the token type counting with a defaultdict:

    from collections import defaultdict
    counts = defaultdict(int)
    for tok in lexer:
      counts[tok.type] += 1
    for type, count in counts.items():
      print ("Number of %s's: %d\n" % (type, count))
    # Or: print('\n'.join("Number of %s's: %d\n" % (type, count) for type, count in counts.items())
    print ("Number of lines: %d\n" % lexer.lineno)