Parse PyLint output using Python script

I am trying to write a simple pylint parser that, given a python project, extracts the information on code smells, class names, and scores.

In particular, the script must analyze each python file and generate a data frame with the previously indicated information (code smells, project name, and score). I wrote the following snippet:

import os
import pandas as pd
from pylint.lint import Run


def pylint_project(project_name):


    global project_df
    pylint_options = ["--disable=F0010"]
    python_files = [f for f in os.listdir(project_name) if f.endswith('.py')]
    for file in python_files:
        file_path = os.path.join(project_name, file)
        pylint_output = Run([file_path] + pylint_options)
        smell_count = pylint_output.lstrip().split()[1]
        score = pylint_output.split()[-2]
        project_df = pd.DataFrame({
            "project_name": [project_name],
            "smell_count": [smell_count],
            "score": [score]
        })

    return project_df


path = "path/to/analyze"
com = pylint_project(path)
com.to_csv("path/to/save")

However, this snippet doesn't work correctly. Indeed, it only prints:

********* Module setup
E:\python_projects\machine_learning_projects\alibi\setup.py:17:0: C0301: Line too long (110/100) (line-too-long)
E:\python_projects\machine_learning_projects\alibi\setup.py:1:0: C0114: Missing module docstring (missing-module-docstring)
E:\python_projects\machine_learning_projects\alibi\setup.py:4:0: C0116: Missing function or method docstring (missing-function-docstring)
E:\python_projects\machine_learning_projects\alibi\setup.py:5:48: C0103: Variable name "f" doesn't conform to snake_case naming style (invalid-name)
E:\python_projects\machine_learning_projects\alibi\setup.py:10:0: W0122: Use of exec (exec-used)
E:\python_projects\machine_learning_projects\alibi\setup.py:10:5: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
E:\python_projects\machine_learning_projects\alibi\setup.py:10:5: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
E:\python_projects\machine_learning_projects\alibi\setup.py:34:18: E0602: Undefined variable '__version__' (undefined-variable)

------------------------------------------------------------------
Your code has been rated at 0.00/10 (previous run: 0.00/10, +0.00

But, without saving the data set, and in addition, it seems that it only analyzes a single file (setup.py)

How can I fix it?

Solution

The following script may include more information than you actually want to use, please adapt to your needs. Especially I don't know if you want to have the code smells itself as well, so I just include them in their own DataFrame.

First, notice the use of glob, which in contrast to os.listdir returns all files recursively in a folder. If you have a virtual environment folder in the project folder you will need to have some condition to avoid running pylint on those.

The use of StringIO for capturing the output of pylint is already pointed out in some other threads, e.g., here.

I use the JSONReporter to get an easy-to-parse output. For the score value see this answer.

Consider using tqdm with the for-loop.

from pylint.reporters import JSONReporter
from pylint.lint import Run
from glob import glob
from io import StringIO
import pandas as pd
import json
import os


def pylint_project(path):
    pylint_options = ["--disable=F0010"]
    pylint_overview = []
    pylint_results = []
    glob_pattern = os.path.join(path, "**", "*.py")
    for filepath in glob(glob_pattern, recursive=True):
        reporter_buffer = StringIO()
        results = Run([filepath] + pylint_options, reporter=JSONReporter(reporter_buffer), do_exit=False)
        score = results.linter.stats.global_note
        file_results = json.loads(reporter_buffer.getvalue())
        pylint_results.extend(file_results)
        pylint_overview.append({
            "filepath": os.path.realpath(filepath),
            "smell_count": len(file_results),
            "score": score
        })
    return pd.DataFrame(pylint_overview), pd.DataFrame(pylint_results)


if __name__ == "__main__":
    overview, results = pylint_project(".")
    print("### Overview")
    print(overview)
    print("\n### All Results")
    print(results)

Output of the script above:

### Overview
                    filepath  smell_count     score
0  /path/to/pylint_parser.py            8  6.923077

### All Results
         type         module             obj  line  column  endLine  endColumn              path                      symbol                                            message message-id
0  convention  pylint_parser                    17       0      NaN        NaN  pylint_parser.py               line-too-long                            Line too long (105/100)      C0301
1  convention  pylint_parser                     1       0      NaN        NaN  pylint_parser.py    missing-module-docstring                           Missing module docstring      C0114
2  convention  pylint_parser  pylint_project    10       0     10.0       18.0  pylint_parser.py  missing-function-docstring               Missing function or method docstring      C0116
3     warning  pylint_parser  pylint_project    17       8     17.0       15.0  pylint_parser.py        redefined-outer-name  Redefining name 'results' from outer scope (li...      W0621
4  convention  pylint_parser                     3       0      3.0       21.0  pylint_parser.py          wrong-import-order  standard import "from glob import glob" should...      C0411
5  convention  pylint_parser                     4       0      4.0       23.0  pylint_parser.py          wrong-import-order  standard import "from io import StringIO" shou...      C0411
6  convention  pylint_parser                     6       0      6.0       11.0  pylint_parser.py          wrong-import-order  standard import "import json" should be placed...      C0411
7  convention  pylint_parser                     7       0      7.0        9.0  pylint_parser.py          wrong-import-order  standard import "import os" should be placed b...      C0411