Search code examples
pythonlistwith-statementreadlines

Python - Using .readlines() with .rstrip() and then store all words into a list


I want to be able to strip the \n character ( .rstrip('\n') ) from a text file (dictionary.txt) that contains 120,000+ words. then counts each line and returns the amount of words in the txt file (each word is on its own line). then finally want all the words to be stored into a list.

at the moment, the code below returns the amount of lines but doesn't strip the \n character so it can be stored into the list.

 def lines_count():
        with open('dictionary.txt') as file:
            print (len(file.readlines()))

Solution

  • If you want the list of lines without the trailing new-line character you can use str.splitlines() method, which in this case you can read the file as string using file_obj.read() then use splitlines() over the whole string. Although, there is no need for such thing when the open function is already returned a generator from your lines (you can simply strip the trailing new-line while processing the lines) or just call the str.strip() with a map to create an iterator of striped lines:

    with open('dictionary.txt'):
        striped_lines = map(str.strip, f)
    

    But if you just want to count the words as a pythonic way you can use a generator expression within sum function like following:

    with open('dictionary.txt') as f:
        word_count = sum(len(line.split()) for line in f)
    

    Note that there is no need to strip the new lines while you're splitting the line.

    e.g.

    In [14]: 'sd f\n'.split()
    Out[14]: ['sd', 'f']
    

    But if you still want all the words in a list you can use a list comprehension instead of a generator expression:

    with open('dictionary.txt') as f:
        all_words = [word for line in f for word in line.split()]
        word_count = len(all_words)