python python-3.x compatibility tokenize

Python 3: tokenize library changes

According to this: http://code.activestate.com/lists/python-list/413540/, tokenize.generate_tokens should be used and not tokenize.tokenize.

This works perfectly fine in Python 2.6. But it does not work anymore in Python 3:

>>> a = list(tokenize.generate_tokens(io.BytesIO("1\n".encode()).readline))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.2/tokenize.py", line 439, in _tokenize
    if line[pos] in '#\r\n':           # skip comments or blank lines

However, also in Python 3, this works (and returns also the desired output):

a = list(tokenize.tokenize(io.BytesIO("1\n".encode()).readline))

According to the documentation, it seems like tokenize.tokenize is the new way to use this module: http://docs.python.org/py3k/library/tokenize.html. tokenize.generate_tokens isn't even documented anymore.

But, why is there still a generate_tokens function in this module, if it's not documented? I haven't found any PEP regarding this.

I'm trying to maintain a code base for Python 2.5-3.2, should I call generate_tokens for Python 2 and tokenize for Python 3? Aren't there any better ways?

Solution

generate_tokens seems to be really a strange thing in Python 3. It doesn't work like in Python 2. However, tokenize.tokenize behaves like the old Python 2 tokenize.generate_tokens. Therefore I wrote a little workaround:

import tokenize                             
if sys.hexversion >= 0x03000000d:                               
    tokenize_func = tokenize.tokenize       
else:                                       
    tokenize_func = tokenize.generate_tokens

Now I just use tokenize_func, which works without problems.