I'm trying to extract the beginning and ending line numbers of all docstrings in a Python module. Is there a sensible way of doing this without regex?
The best way to do this is with the ast
module. In particular, ast.get_docstring
almost does what you want; it returns the content of the docstring rather than the node, but you can use the same algorithm to find the docstring node and its location:
root = ast.parse('''
def foo():
"""the foo function"""
pass
''')
for node in ast.walk(root):
if isinstance(node, (ast.FunctionDef, ast.ClassDef, ast.Module)):
if (node.body and isinstance(node.body[0], ast.Expr) and
isinstance(node.body[0].value, ast.Str)):
print node.lineno, node.body[0].value.lineno, node.body[0].value.s
Although undocumented, the lineno
property gives the last line of a node, so the lineno
of the parent node will be the first line of the docstring or the line before it. It doesn't look like there's an easy way to tell the difference between a docstring starting on the same line as the class
or def
keyword and on the following line, especially when you consider line continuation (\
) characters.