Search code examples
pythonregexlist-comprehension

List Comprehension with Regular Expressions in a Text File Python


I'm doing a Python course and want to find all numbers in a text file with regular expression and sum them up. Now I want to try to do it with list comprehension.

import re
try:
     fh = open(input('Enter a file Name: ')) #input
except:
    print('Enter an existing file name') #eror
    quit()
    
he = list() #store numbers
for lines in fh:
    lines.rstrip()
    stuff = re.findall('[0-9]+', lines)
    if len(stuff) == 0: #skip lines with no number
        continue
    else:
        for i in stuff:
            he.append(int(i)) #add numbers to storage
print(sum(he)) #print sum of stored numbers

This is my current code. The instructor said its possible to write the code in 2 lines or so.

import re
print( sum( [ ****** *** * in **********('[0-9]+',**************************.read()) ] ) )

the "*" should be replaced.

This text should be used to practice:

Why should you learn to write programs? 7746 12 1929 8827 Writing programs (or programming) is a very creative 7 and rewarding activity. You can write programs for many reasons, ranging from making your living to solving 8837 a difficult data analysis problem to having fun to helping 128 someone else solve a problem. This book assumes that everyone needs to know how to program ...

I know the general concept of list comprehension but I have no idea what to do.


Solution

  • I think your instructor meant something like this:

    import re
    print(sum([int(i) for i in re.findall('[0-9]+', open(input('Enter a file Name: ')).read())]))
    

    I spread it out into more lines so we can read it more easily:

    print(
        sum([
            int(i) for i in re.findall(
                '[0-9]+', open(input('Enter a file Name: ')).read()
            )
        ])
    )
    

    To explain what is going on here, let's replace the parts of your code step by step.

    You can create the stuff variable in the same way as your original code in only one line:

    stuff = re.findall('[0-9]+', open(input('Enter a file Name: ')).read())
    

    All I did there was move the file opening, open(input('Enter a file Name: ')) into the re.findall(), and not bother doing for lines in fh.

    Then, instead of doing a for loop, for i in stuff and adding int(i) into the he list one-by-one, we can use our first list comprehension:

    he = [int(i) for i in stuff]
    

    Or, if we replace stuff with what we wrote before,

    he = [int(i) for i in re.findall('[0-9]+', open(input('Enter a file Name: ')).read())]
    

    Finally, we put a sum around that to get the sum of all items in the list he that we have created.