Search code examples
pythonpackagepython-moduleproject-structure

How to get a list of unique python modules imported from different scripts located in different folders


I have a project contained in a folder (src). It is divided into different folders and each one contains some script .py. An example of the project structure is the following:

├── src                <- Source code for use in this project.
│   │
│   ├── data           <- Scripts to download or generate data.
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling.
│   │
│   ├── models         <- Scripts to train models and then use trained models to make predictions.

Like I said before, inside each folder under src I have a different python script. For example:

├── data               
   │
   ├── create_dataframe.py
   ├── imputation.py
.
.
.

There is a way to get a list of all unique python modules used in each separate script?

For example, if I have in create_dataframe.py

import pandas as pd

and in imputation.py I have

import pandas as pd
import numpy  as np
from   tqdm import tqdm

The desired output would be

Output: ['pandas', 'numpy', 'tqdm']

I am not interested in the module's version. I just want the name of the module.


Solution

  • Run this script from src folder:

    import pathlib
    
    path = pathlib.Path.cwd()
    modules = set()
    
    for file in path.glob('**/*.py'):
        lines = file.read_text(encoding='utf-8')
        for line in lines.split('\n'):
            if line.startswith('import ') or line.startswith('from '):
                modules.add(line.split()[1].replace(';','').replace(',',''))
    
    for module in sorted(modules):
        print(module)