I implement a python tokenizer to extract tokens from a text file. Tokens relate to strings which "fit to" a pattern (regular expression) i defined for every token. I use the lexer functionality from the python package ply to implement the tokenizer. After scanning the text file all found tokens are returned as generator. For unit testing i would like to insert additional tokens at defined places within the "returned token list" to verify if the tokenizer handles correctly in such a bad case situation.
How can i create a "fake" token object with ply (python module ply.lex) which i can insert into the token list.
You can easily construct your own tokens, if you want to insert a token into the lex stream. (How you actually insert the token is up to you, of course.)
From the ply
documentation:
The tokens returned by
lexer.token()
are instances ofLexToken
. This object has attributestok.type
,tok.value
,tok.lineno
, andtok.lexpos
.…The
tok.type
andtok.value
attributes contain the type and value of the token itself.tok.line
andtok.lexpos
contain information about the location of the token. tok.lexpos is the index of the token relative to the start of the input text.
In addition, the token has a lexer
attribute whose value is the lexer object which created the token.
Here's an example of the creation of a LexToken
(adapted from lex.py
), for a synthesized error
token (self
at this point is the lexer object):
tok = LexToken()
tok.value = self.lexdata[lexpos:]
tok.lineno = self.lineno
tok.type = 'error'
tok.lexer = self
tok.lexpos = lexpos