Search code examples
pythonregexabstract-syntax-treeparse-tree

Regex Python exclude some results


There is a test String:

Module([Assign([Name('a', Store())], Num(2)), Assign([Name('b', Store())], Num(3)), Assign([Name('c', Store())], Str('Hello')), Assign([Name('x', Store())], BinOp(Name('a', Load()), Add(), Name('b', Load()))), Assign([Name('x', Store())], Name('a', Load())), Expr(Call(Name('print', Load()), [Name('a', Load())], [], None, None)), For(Name('i', Store()), Call(Name('range', Load()), [Num(10)], [], None, None), [Expr(Call(Name('print', Load()), [Name('a', Load())], [], None, None))], [])])

I am trying to get all loaded variable names from it. My regexp is

[a-z]+(?=', Load)

Result of it is the following: result of regex As you can see it also finds built-in modules such as print, range. How to exclude them? Values to be excluded are preceded by

Call(Name(' 

I tried

 (?=Call\(Name\(')[a-z]+(?=', Load)

but it did not work out.

My code is:

import re

test = '''Module([Assign([Name('a', Store())], Num(2)), Assign([Name('b', Store())], Num(3)), Assign([Name('c', Store())], Str('Hello')), Assign([Name('x', Store())], BinOp(Name('a', Load()), Add(), Name('b', Load()))), Assign([Name('x', Store())], Name('a', Load())), Expr(Call(Name('print', Load()), [Name('a', Load())], [], None, None)), For(Name('i', Store()), Call(Name('range', Load()), [Num(10)], [], None, None), [Expr(Call(Name('print', Load()), [Name('a', Load())], [], None, None))], [])])'''
print(re.findall(r"[a-z]+(?=', Load)", test))
print(re.findall(r"(?=Call\(Name\(')[a-z]+(?=', Load) ", test))

Solution

  • Use a lookbehind and word boundary.

    (?<!Call\(Name\(')\b\w+\b(?=', Load)
    

    See demo.

    https://regex101.com/r/hdxlQ8/1