Search code examples
python-3.xlowercaseordereddictionary

Convert everything in a dictionary to lower case, then filter on it?


import pandas as pd
import nltk
import os

directory = os.listdir(r"C:\...")

x = []
num = 0

for i in directory:
    x.append(pd.read_fwf("C:\\..." + i))
    x[num] = x[num].to_string()

So, once I have a dictionary x = [ ] populated by the read_fwf for each file in my directory:

  • I want to know how to make it so every single character is lowercase. I am having trouble understanding the syntax and how it is applied to a dictionary.

  • I want to define a filter that I can use to count for a list of words in this newly defined dictionary, e.g.,

list = [bus, car, train, aeroplane, tram, ...]

Edit: Quick unrelated question:

Is pd_read_fwf the best way to read .txt files? If not, what else could I use?

Any help is very much appreciated. Thanks

Edit 2: Sample data and output that I want:

Sample:

The Horncastle boar's head is an early seventh-century Anglo-Saxon ornament depicting a boar that probably was once part of the crest of a helmet. It was discovered in 2002 by a metal detectorist searching in the town of Horncastle, Lincolnshire. It was reported as found treasure and acquired for £15,000 by the City and County Museum, where it is on permanent display.

Required output - changes everything in uppercase to lowercase:

the horncastle boar's head is an early seventh-century anglo-saxon ornament depicting a boar that probably was once part of the crest of a helmet. it was discovered in 2002 by a metal detectorist searching in the town of horncastle, lincolnshire. it was reported as found treasure and acquired for £15,000 by the city and county museum, where it is on permanent display.


Solution

  • You shouldn't need to use pandas or dictionaries at all. Just use Python's built-in open() function:

    # Open a file in read mode with a context manager
    with open(r'C:\path\to\you\file.txt', 'r') as file:
        # Read the file into a string
        text = file.read()
        # Use the string's lower() method to make everything lowercase
        text = text.lower()
        print(text)
    
        # Split text by whitespace into list of words
        word_list = text.split()
        # Get the number of elements in the list (the word count)
        word_count = len(word_list)
        print(word_count)
    

    If you want, you can do it in the reverse order:

    # Open a file in read mode with a context manager
    with open(r'C:\path\to\you\file.txt', 'r') as file:
        # Read the file into a string
        text = file.read()
        # Split text by whitespace into list of words
        word_list = text.split()
        # Use list comprehension to create a new list with the lower() method applied to each word.
        lowercase_word_list = [word.lower() for word in word_list]
        print(word_list)
    

    Using a context manager for this is good since it automatically closes the file for you as soon as it goes out of scope (de-tabbed from with statement block). Otherwise you would have to use file.open() and file.read().

    I think there are some other benefits to using context managers, but someone please correct me if I'm wrong.