Search code examples
pythonunit-testingtokenizeply

Python tokenizer unit testing: insert one token inside a generated token list


I implement a python tokenizer to extract tokens from a text file. Tokens relate to strings which "fit to" a pattern (regular expression) i defined for every token. I use the lexer functionality from the python package ply to implement the tokenizer. After scanning the text file all found tokens are returned as generator. For unit testing i would like to insert additional tokens at defined places within the "returned token list" to verify if the tokenizer handles correctly in such a bad case situation.

How can i create a "fake" token object with ply (python module ply.lex) which i can insert into the token list.


Solution

  • You can easily construct your own tokens, if you want to insert a token into the lex stream. (How you actually insert the token is up to you, of course.)

    From the ply documentation:

    The tokens returned by lexer.token() are instances of LexToken. This object has attributes tok.type, tok.value, tok.lineno, and tok.lexpos.…

    The tok.type and tok.value attributes contain the type and value of the token itself. tok.line and tok.lexpos contain information about the location of the token. tok.lexpos is the index of the token relative to the start of the input text.

    In addition, the token has a lexer attribute whose value is the lexer object which created the token.

    Here's an example of the creation of a LexToken (adapted from lex.py), for a synthesized error token (self at this point is the lexer object):

    tok = LexToken()
    tok.value = self.lexdata[lexpos:]
    tok.lineno = self.lineno
    tok.type = 'error'
    tok.lexer = self
    tok.lexpos = lexpos