I want to create a script that looks inside a Python file and finds all import
statements. Possible variations of those are the following:
import os
import numpy as np
from itertools import accumulate
from collections import Counter as C
from pandas import *
By looking at these, one could argue that the logic should be:
Get me all <foo>
from from <foo>
statements and those <bar>
from import <bar>
that are not preceded by from <foo>
.
To translate the above in regex, I wrote:
from (\w+)|(?<!from \w+)import (\w+)
The problem seems to be with the non-fixed width of the negative lookbehind but I cannot seem to be able to fix it.
EDIT:
As a bonus, it would also be nice to capture multiple includes as in:
import sys, glob
It seems you only want to extract the matches from the start of a line, taking into account the leading whitespace.
You may consider using
^\s*(?:from|import)\s+(\w+(?:\s*,\s*\w+)*)
See the regex demo.
Details
^
- start of string (use re.M
to also match start of a line)\s*
- 0+ whitespaces (use [^\S\r\n]*
to only match horizontal whitespace) (?:from|import)
- either of the two words \s+
- 1+ whitespaces(\w+(?:\s*,\s*\w+)*)
- 1 or more word chars, followed with 0+ occurrences of 0+ whitespaces, ,
, 0+ whitespaces and then 1+ word chars.In Python, you may later split the Group 1 value with re.split(r'\s*,\s*', group_1_value)
to get individual comma-separated module names.