Search code examples
pythonlistline

Exclude empty lines and comment lines


    import os


def countlines(start, lines=0, header=True, begin_start=None):
    if header:
        print('{:>10} |{:>10} | {:<20}'.format('ADDED', 'TOTAL', 'FILE'))
        print('{:->11}|{:->11}|{:->20}'.format('', '', ''))

    for thing in os.listdir(start):
        thing = os.path.join(start, thing)
        if os.path.isfile(thing):
            if thing.endswith('.py'):
                with open(thing, 'r') as f:
                    newlines = f.readlines()
                    newlines = list(filter(lambda l: l.replace(' ', '') not in ['\n', '\r\n'], newlines))
                    newlines = list(filter(lambda l: not l.startswith('#'), newlines))
                    newlines = len(newlines)
                    lines += newlines

                    if begin_start is not None:
                        reldir_of_thing = '.' + thing.replace(begin_start, '')
                    else:
                        reldir_of_thing = '.' + thing.replace(start, '')

                    print('{:>10} |{:>10} | {:<20}'.format(
                        newlines, lines, reldir_of_thing))

    for thing in os.listdir(start):
        thing = os.path.join(start, thing)
        if os.path.isdir(thing):
            lines = countlines(thing, lines, header=False, begin_start=start)

    return lines


countlines(r'/Documents/Python/')

If we take the standard Python file .main.py, then there are 4 lines of code in it. And he counts as 5. How to fix it? How to properly set up a filter so that it does not count empty lines of code and comments?


Solution

  • 1. You can modify your first filter condition: strip the line, and then check that it isn't empty.

    lambda l: l.replace(' ', '') not in ['\n', '\r\n']
    

    becomes

    lambda l: l.strip()
    

    2. filter takes any iterable, so no need to convert it to lists every time - this is a waste because it forces two sets of iterations - one when you create the list, another when you filter it a second time. You could remove the calls to list() and only do it once after all your filtering is done. You can also use filter on the file handle itself, since the file handle f is an iterable that yields lines from the file in every iteration. This way, you only iterate over the entire file once.

    newlines = filter(lambda l: l.strip(), f)
    newlines = filter(lambda l: not l.strip().startswith('#'), newlines)
    num_lines = len(list(newlines))
    

    Note that I renamed the last variable, because a variable name should describe what it is

    3. You can combine both your filter condition into a single lambda

    lambda l: l.strip() and not l.strip().startswith('#')
    

    or, if you have Python 3.8+,

    lambda l: (l1 := l.strip()) and not l1.startswith('#')
    

    This makes my point #2 about not listing out the above moot -

    num_lines = len(list(filter(lambda l: (l1 := l.strip()) and l1.startswith('#'), f)))
    

    With the following input, this gives the correct line count:

    file.py:

    print("Hello World")
    # This is a comment
    # The next line is blank
    
    print("Bye")
    
    >>> with open('file.py') as f:
    ...    num_lines = len(list(filter(lambda l: (l1 := l.strip()) and l1.startswith('#'), f)))
    ...    print(num_lines)
    
    Out: 2