Search code examples
pythontokenizeinteractive-mode

tokenize/untokenize python string code so that it is compatible with interactive-mode


I have python code like that has this kind of structure

def main:

    ''' comment '''
    if True:
        print "do"
    print "done

This code is not compatible with the interactive-mode (for example if I copy/paste it in an interactive session). For this it would need to be :

def main:
    ''' comment '''
    if True:
        print "do"

    print "done"

otherwise the interactive mode breaks on Indentation problems.

Do you know a simple way to transform the code with the generate_token / untokenize chain ? I am a bit lost in the NL / NEWLINE / INDENT / DEDENT semantics.

I found this Script to remove Python comments/docstrings that removes comments/docstrings. It looks like a perfect fit for my problem but it cannot sort it out to have a clean output on complex code.


Solution

  • the best I could come up with (resolved my issue)

    def _python_interactive_indent(self, code):
        prev_toktype = tokenize.INDENT
        first_line = None
        last_lineno = -1
        last_col = 0
    
        output = ''
    
        tokgen = tokenize.generate_tokens(StringIO.StringIO(code).readline)
        indent = 0
        hasNL = False
        prefixed = False
        for toktype, ttext, (slineno, scol), (elineno, ecol), ltext in tokgen:
            done = False
            if toktype == tokenize.INDENT:
                indent = indent + 1
            if toktype == tokenize.DEDENT:
                indent = indent - 1
            if slineno > last_lineno:
                last_col = 0
            if not done and toktype == tokenize.NL:
                hasNL = True
                done = True
            if not done and toktype == tokenize.COMMENT:
                done = True
            if not done and toktype == tokenize.STRING and prev_toktype == tokenize.INDENT:
                done = True
            if not done and hasNL and toktype != tokenize.DEDENT and toktype != tokenize.INDENT:
                hasNL = False
                output = output + ("    " * indent) + '\n'
                output += "    " * indent
                prefixed = True
            if not done:
                if not prefixed and scol > last_col:
                    output += (" " * (scol - last_col))
                output += (ttext)
            prefixed = False
            prev_toktype = toktype
            last_col = ecol
            last_lineno = elineno
        return output