Search code examples
pythonregexpython-3.xnegative-lookbehind

Regex to capture all import statements


I want to create a script that looks inside a Python file and finds all import statements. Possible variations of those are the following:

import os
import numpy as np
from itertools import accumulate
from collections import Counter as C
from pandas import *

By looking at these, one could argue that the logic should be:

Get me all <foo> from from <foo> statements and those <bar> from import <bar> that are not preceded by from <foo>.

To translate the above in regex, I wrote:

from (\w+)|(?<!from \w+)import (\w+)

The problem seems to be with the non-fixed width of the negative lookbehind but I cannot seem to be able to fix it.

EDIT:

As a bonus, it would also be nice to capture multiple includes as in:

import sys, glob

Solution

  • It seems you only want to extract the matches from the start of a line, taking into account the leading whitespace.

    You may consider using

    ^\s*(?:from|import)\s+(\w+(?:\s*,\s*\w+)*)
    

    See the regex demo.

    Details

    • ^ - start of string (use re.M to also match start of a line)
    • \s* - 0+ whitespaces (use [^\S\r\n]* to only match horizontal whitespace)
    • (?:from|import) - either of the two words
    • \s+ - 1+ whitespaces
    • (\w+(?:\s*,\s*\w+)*) - 1 or more word chars, followed with 0+ occurrences of 0+ whitespaces, ,, 0+ whitespaces and then 1+ word chars.

    In Python, you may later split the Group 1 value with re.split(r'\s*,\s*', group_1_value) to get individual comma-separated module names.