I am trying to write a simple pylint parser that, given a python project, extracts the information on code smells, class names, and scores.
In particular, the script must analyze each python file and generate a data frame with the previously indicated information (code smells, project name, and score). I wrote the following snippet:
import os
import pandas as pd
from pylint.lint import Run
def pylint_project(project_name):
global project_df
pylint_options = ["--disable=F0010"]
python_files = [f for f in os.listdir(project_name) if f.endswith('.py')]
for file in python_files:
file_path = os.path.join(project_name, file)
pylint_output = Run([file_path] + pylint_options)
smell_count = pylint_output.lstrip().split()[1]
score = pylint_output.split()[-2]
project_df = pd.DataFrame({
"project_name": [project_name],
"smell_count": [smell_count],
"score": [score]
})
return project_df
path = "path/to/analyze"
com = pylint_project(path)
com.to_csv("path/to/save")
However, this snippet doesn't work correctly. Indeed, it only prints:
********* Module setup
E:\python_projects\machine_learning_projects\alibi\setup.py:17:0: C0301: Line too long (110/100) (line-too-long)
E:\python_projects\machine_learning_projects\alibi\setup.py:1:0: C0114: Missing module docstring (missing-module-docstring)
E:\python_projects\machine_learning_projects\alibi\setup.py:4:0: C0116: Missing function or method docstring (missing-function-docstring)
E:\python_projects\machine_learning_projects\alibi\setup.py:5:48: C0103: Variable name "f" doesn't conform to snake_case naming style (invalid-name)
E:\python_projects\machine_learning_projects\alibi\setup.py:10:0: W0122: Use of exec (exec-used)
E:\python_projects\machine_learning_projects\alibi\setup.py:10:5: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
E:\python_projects\machine_learning_projects\alibi\setup.py:10:5: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
E:\python_projects\machine_learning_projects\alibi\setup.py:34:18: E0602: Undefined variable '__version__' (undefined-variable)
------------------------------------------------------------------
Your code has been rated at 0.00/10 (previous run: 0.00/10, +0.00
But, without saving the data set, and in addition, it seems that it only analyzes a single file (setup.py)
How can I fix it?
The following script may include more information than you actually want to use, please adapt to your needs. Especially I don't know if you want to have the code smells itself as well, so I just include them in their own DataFrame.
First, notice the use of glob
, which in contrast to os.listdir
returns all files recursively in a folder. If you have a virtual environment folder in the project folder you will need to have some condition to avoid running pylint
on those.
The use of StringIO
for capturing the output of pylint
is already pointed out in some other threads, e.g., here.
I use the JSONReporter
to get an easy-to-parse output. For the score value see this answer.
Consider using tqdm
with the for-loop.
from pylint.reporters import JSONReporter
from pylint.lint import Run
from glob import glob
from io import StringIO
import pandas as pd
import json
import os
def pylint_project(path):
pylint_options = ["--disable=F0010"]
pylint_overview = []
pylint_results = []
glob_pattern = os.path.join(path, "**", "*.py")
for filepath in glob(glob_pattern, recursive=True):
reporter_buffer = StringIO()
results = Run([filepath] + pylint_options, reporter=JSONReporter(reporter_buffer), do_exit=False)
score = results.linter.stats.global_note
file_results = json.loads(reporter_buffer.getvalue())
pylint_results.extend(file_results)
pylint_overview.append({
"filepath": os.path.realpath(filepath),
"smell_count": len(file_results),
"score": score
})
return pd.DataFrame(pylint_overview), pd.DataFrame(pylint_results)
if __name__ == "__main__":
overview, results = pylint_project(".")
print("### Overview")
print(overview)
print("\n### All Results")
print(results)
Output of the script above:
### Overview
filepath smell_count score
0 /path/to/pylint_parser.py 8 6.923077
### All Results
type module obj line column endLine endColumn path symbol message message-id
0 convention pylint_parser 17 0 NaN NaN pylint_parser.py line-too-long Line too long (105/100) C0301
1 convention pylint_parser 1 0 NaN NaN pylint_parser.py missing-module-docstring Missing module docstring C0114
2 convention pylint_parser pylint_project 10 0 10.0 18.0 pylint_parser.py missing-function-docstring Missing function or method docstring C0116
3 warning pylint_parser pylint_project 17 8 17.0 15.0 pylint_parser.py redefined-outer-name Redefining name 'results' from outer scope (li... W0621
4 convention pylint_parser 3 0 3.0 21.0 pylint_parser.py wrong-import-order standard import "from glob import glob" should... C0411
5 convention pylint_parser 4 0 4.0 23.0 pylint_parser.py wrong-import-order standard import "from io import StringIO" shou... C0411
6 convention pylint_parser 6 0 6.0 11.0 pylint_parser.py wrong-import-order standard import "import json" should be placed... C0411
7 convention pylint_parser 7 0 7.0 9.0 pylint_parser.py wrong-import-order standard import "import os" should be placed b... C0411