I'm attempting to write a FileAnalyzer class that would search a directory for Python files and provide details of each Python file in the form of a PrettyTable. I'm interested in the number of classes, functions, lines, and characters in each Python file.
Learning the ropes of OOP...here's the code I have so far:
class FileAnalyzer:
def __init__(self, directory: str) -> None:
"""
The files_summary attribute stores the summarized data for each Python file in the specified directory.
"""
self.directory: str = os.listdir(directory) #Directory to be scanned
self.analyze_files() # summarize the python files data
self.files_summary: Dict[str, Dict[str, int]] = {
dir: {
'Number of Classes': cls,
'Number of Functions': funccount,
'Number of Lines of Code': codelines,
'Number of Characters': characters
}
}
def analyze_files(self) -> None:
"""
This method scans a directory for python files. For every python file, it determines the number of classes,
functions, lines of code, and characters. The count for each one is returned in a tuple.
"""
for dir in self.directory:
if dir.endswith('.py'): # Check for python files
with open(dir, "r") as pyfile:
cls = 0 # Initialize classes count
for line in pyfile:
if line.startswith('Class'):
cls += 1
funccount = 0 # Initialize function count
for line in pyfile:
if line.startswith('def'):
funccount += 1
#Get number of lines of code
i = -1 #Account for empty files
for i, line in enumerate(pyfile):
pass
codelines = i + 1
#Get number of characters
characters = 0
characters += sum(len(line) for line in pyfile)
return [cls, funccount, codelines, characters]
def pretty_print(self) -> None:
"""
This method creates a table with the desired counts from the Python files using the PrettyTable module.
"""
pt: PrettyTable = PrettyTable(field_names=['# of Classes', '# of Functions', '# Lines of Code (Excluding Comments)',
'# of characters in file (Including Comments)'])
for cls, funccount, codelines, characters in self.files_summary():
pt.add_row([cls, funccount, codelines, characters])
print(pt)
FileAnalyzer('/path/to/directory/withpythonfiles')
Currently getting an error of NameError: name 'cls' is not defined
when I try to run the code. Is calling self.analyze_files()
within __init__
not enough to pass the returned values into __init__
? Ideally, for a python file of
def func1():
pass
def func2():
pass
class Foo:
def __init__(self):
pass
class Bar:
def __init__(self):
pass
if __name__ == "__main__":
main()
The PrettyTable would tell me that there are 2 classes, 4 functions, 25 lines, and 270 characters. For a file of:
definitely not function
This is def not a function def
The PrettyTable would tell me that the file has 0 functions. I would like self.analyze_files()
to populate summarized data into self.files_summary
without passing any other arguments to analyze_files()
. And likewise passing the data from files_summary
to pretty_print
without a separate argument passed to pretty_print
.
Edit:
self.files_summary: Dict[str, Dict[str, int]] = {
dir: {
'Number of Classes': self.analyze_files()[0],
'Number of Functions': self.analyze_files()[1],
'Number of Lines of Code': self.analyze_files()[2],
'Number of Characters': self.analyze_files()[3]
}
}
suppressed the errors, but
for self.analyze_files()[0], self.analyze_files()[1], self.analyze_files()[2], self.analyze_files()[3] in self.files_summary():
pt.add_row([self.analyze_files()[0], self.analyze_files()[1], self.analyze_files()[2], self.analyze_files()[3]])
return pt
in pretty_print
doesn't do anything when I call FileAnalyzer...
This question is a bit broad, so it's hard to provide a concise answer. You said in comments:
If I do something like
[cls, funccount, codelines, characters] = self.analyze_files()
within init, doesn't seem to reference the returned values properly either
While a little odd stylistically, that's actually perfectly fine syntax. If your __init__
method looks like this, it runs without error:
def __init__(self, directory: str) -> None:
"""
The files_summary attribute stores the summarized data for each Python file in the specified directory.
"""
self.directory: str = os.listdir(directory) #Directory to be scanned
[cls, funccount, codelines, characters] = self.analyze_files()
self.files_summary: Dict[str, Dict[str, int]] = {
dir: {
'Number of Classes': cls,
'Number of Functions': funccount,
'Number of Lines of Code': codelines,
'Number of Characters': characters
}
}
There are a number of issues, however. First, in the above method, you're using the variable name dir
, but there is no such variable in scope. Unfortunately, dir
is also the name of a Python built-in function. If you insert a breakpoint()
after this section of code and print the value of self.files_summary
, you'll see it looks something like this:
{<built-in function dir>: {'Number of Classes': 0, 'Number of Functions': 0, 'Number of Lines of Code': 0, 'Number of Characters': 0}}
In general, never pick variable names that shadow Python built-ins, because it can cause unexpected and hard to debug problems. If you use a decent editor with Python syntax highlighting support you will see these built-ins called out so that you can avoid this mistake.
I think rather than dir
you mean self.directory
(or just directory
, because that variable is in scope at this point).
But there another problem.
In your pretty_print
method, you are attempting to call self.files_summary
, like this:
for cls, funccount, codelines, characters in self.files_summary():
But self.files_summary
isn't a function and is not callable. It is a dictionary, which also means it doesn't really make sense to use it in a for
loop like this. It's only ever going to have a single key due to the way you set it in __init__
.
If I were you, I would break this program down in individual pieces, and get each piece working correctly first before trying to tie it all together. Make good use of both the interactive Python prompt and the debugger; using breakpoint()
statements in your code to investigate the content of variables before you use them.
If I were to rewrite your code, I might do something like this:
import os
import re
from prettytable import PrettyTable
re_class = re.compile(r'class')
re_def = re.compile(r'\s*def')
class FileAnalyzer:
def __init__(self, path: str) -> None:
self.path = path
self.analyze_files()
def analyze_files(self) -> None:
self.files = []
for entry in os.listdir(self.path):
if not entry.endswith('.py'):
continue
with open(entry, "r") as pyfile:
cls = 0
funccount = 0
codelines = 0
characters = 0
for line in pyfile:
codelines += 1
characters += len(line)
if re_class.match(line):
cls += 1
elif re_def.match(line):
funccount += 1
self.files.append((entry, cls, funccount, codelines, characters))
def pretty_print(self) -> None:
pt: PrettyTable = PrettyTable(
field_names=['Filename',
'# of Classes', '# of Functions',
'# Lines of Code (Excluding Comments)',
'# of characters in file (Including Comments)'])
for path, cls, funccount, codelines, characters in self.files:
pt.add_row([path, cls, funccount, codelines, characters])
print(pt)
x = FileAnalyzer('.')
x.pretty_print()
Note that I've dropped the multuple for
loops in your analyze_files
function; there's no reason to iterate through each file multiple times. This builds up an instance variable named files
that is a list of results. The pretty_print
method simply iterates over this list.
If I run the above code in my Python scratch directory, I get:
+--------------------------+--------------+----------------+--------------------------------------+----------------------------------------------+
| Filename | # of Classes | # of Functions | # Lines of Code (Excluding Comments) | # of characters in file (Including Comments) |
+--------------------------+--------------+----------------+--------------------------------------+----------------------------------------------+
| yamltest.py | 0 | 0 | 30 | 605 |
| analyzer.py | 1 | 3 | 53 | 1467 |
| quake.py | 0 | 0 | 37 | 1035 |
| test_compute_examples.py | 1 | 1 | 10 | 264 |
| compute_examples.py | 1 | 1 | 4 | 82 |
+--------------------------+--------------+----------------+--------------------------------------+----------------------------------------------+